C++ Logo

std-proposals

Advanced search

Re: Flash Alloc - 3x faster

From: Phil Bouchard <boost_at_[hidden]>
Date: Sat, 24 Jul 2021 18:59:33 -0400
- Front / back (non middle) deletions and insertions will guarantee the allocations are contiguous (will keep block sizes together, organized and without holes in between);
- The size of the pages will be directly proportional to the usage frequency so the greater the demand is, the bigger the memory pages will be thus less overall page allocations);
- What I presented is perfect for LIFO container types, a similar specialization can be done with FIFO and other container types will use a generic specialization.

That will force the developers to think twice before using a specific container type of they really want speed.

--
Phil Bouchard
Founder
C.: (819) 328-4743
> On Jul 24, 2021, at 6:41 PM, Scott Michaud <scott_at_[hidden]> wrote:
> 
> 
> How can the allocator be easily optimized knowing container type and usage frequency?
> 
> 
> 
> On 7/24/2021 6:24 PM, Phil Bouchard wrote:
>> It is exactly what you referred to: a LIFO cache. 
>> 
>> If developers can give hints on the container type and usage frequency then the allocator can be easily optimized.
>> 
>> --
>> 
>> Phil Bouchard
>> Founder
>> C.: (819) 328-4743
>> 
>> 
>> 
>>> On Jul 24, 2021, at 4:10 PM, Scott Michaud <scott_at_[hidden]> wrote:
>>> 
>>> 
>>> I'm not sure what situation you're optimizing for. It looks like the main speed-up is that you're adding and removing pages way less frequently than deallocate is being called. While I haven't checked if this specific allocator implementation returns valid memory, preallocating chunks and dolling them out is how custom allocators do fast allocations and deallocations.
>>> 
>>> See: https://godbolt.org/z/d5W16MEvd
>>> 
>>> I don't see how your array of containers is supposed to help, though. In my experience with allocators, we just grab a raw slab of memory. Optimization comes from things like knowing its access patterns (ex: LIFO can be dolled out as stacks) and knowing its usage (ex: items that are used together should be placed near things that are likely to be in cache together).
>>> 
>>> Care to elaborate?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 7/24/2021 2:44 PM, Phil Bouchard via Std-Proposals wrote:
>>>> Yeah I updated it again (disabled page_t initialization) so in general it's more like 3x faster. Which is good if you require low-latency (finance, gaming industry, ...). That's why we all use C++ after all, no?
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Phil Bouchard
>>>> Founder & CTO
>>>> C.: (819) 328-4743
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 7/24/21 12:18 PM, Phil Bouchard via Std-Proposals wrote:
>>>>> Interestingly, if I increase the LOOP_SIZE the overall time taken is less thus is faster. Also please keep DATASET_SIZE to 1 because I didn't test it with other sizes.
>>>>> 
>>>>> I'll follow later this weekend, meanwhile I've put the code here:
>>>>> 
>>>>> https://github.com/philippeb8/Flash-Alloc
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> Phil Bouchard
>>>>> Founder & CTO
>>>>> C.: (819) 328-4743
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 7/24/21 6:19 AM, DBJ wrote:
>>>>>> https://godbolt.org/z/T4qc5o8Mb
>>>>>> 
>>>>>> that turns out to be many times slower vs. std::allocator<> ...
>>>>>> 
>>>>>> I must be doing something wrong?
>>>>>> 
>>>>>> On Sat, 24 Jul 2021 at 09:40, Phil Bouchard via Std-Proposals <std-proposals_at_[hidden]> wrote:
>>>>>>> And here's a more generic one that is 10x faster for straight allocations.
>>>>>>> 
>>>>>>> Anyway my point being that apparently the rebind oddity has been removed from the C++20 standards but not from my system headers... So perhaps adding a similar ultra fast allocator such as this one into the stdlib would be constructive.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> -- 
>>>>>>> 
>>>>>>> Phil Bouchard
>>>>>>> Founder & CTO
>>>>>>> C.: (819) 328-4743
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/23/21 10:23 PM, Phil Bouchard via Std-Proposals wrote:
>>>>>>>> Greetings,
>>>>>>>> 
>>>>>>>> Given the default memory allocator is known to be slow, it came to my attention that if we collect more information at compile-time regarding not only the type being allocated but the container type and the usage frequency then we can have much higher performance.
>>>>>>>> 
>>>>>>>> In the attached example, if we use a queue then we can speed up the overall allocation time by 7x!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> 
>>>>>>>> Phil Bouchard
>>>>>>>> Founder & CTO
>>>>>>>> C.: (819) 328-4743
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> -- 
>>>>>>> Std-Proposals mailing list
>>>>>>> Std-Proposals_at_[hidden]
>>>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>>> 
>>>> 

Received on 2021-07-24 17:59:37