Date: Wed, 5 Aug 2020 10:27:11 +1200
Just to retroactively invalidate my previous statement, I've found that GCC
8 and MSVC don't have the same 10% performance detriment as GCC9. I'm in
the middle of doing more cross-compiler testing to be thorough on this. I
suspect that the 2% performance difference is likely to hold true, but will
have to see. I would be interested to know what sort of % performance
threshold people would tolerate in order to have a working reserve function.
In the meantime I do one other question for people:
If we allow unused empty memory blocks (which is what reserve ensures),
that means we can retain memory blocks when all elements in them are
erased, rather than deallocating the blocks. This has numerous potential
performance advantages, for example if you have filled all element memory
blocks and then insert a new element, a new memory block will be created.
If that element is subsequently erased and then another inserted, on
repeat, without retaining the memory block you would end up deallocating
and reallocating a memory block repeatedly.
However it also throws up a lot of implementation-defined decision-making
as to the capacity of memory blocks which will be retained. For example, we
would almost always want to retain the back block as this will be the
largest - and similarly we might want to retain any other blocks which are
at the maximum memory block size. But if there's twenty memory blocks, and
the front memory block gets all it's elements erased, we wouldn't want to
retain it, as it's capacity would be small, which leads to worse cache
locality. Re-using that block would be a detriment to performance.
Also, retaining erased memory blocks leads to a vector-like scenario where
the user can have a lot of unused memory after erasures, until a
shrink-to-fit is called (I just remembered from the june talk that we
established that shrink-to-fit would be a non-element-reallocating "free
unused blocks" function much like std::deque, in this case). This has the
advantage of allowing the user to move a lot of deallocation to
non-hot-loop areas, but how much deallocation would occur in the hot loops
themselves would be dependent on the implementation as mentioned in the
previous paragraph.
Are either of these two aspects problematic, if we retain reserve?
Now, replying to Ben:
On Wed, 5 Aug 2020 at 09:23, Ben Craig via SG14 <sg14_at_[hidden]>
wrote:
> I believe your results, but I'm trying to understand the "why" a bit
> better. Does the addition of reserve() have knock-on effects on the rest
> of the design of the container? Perhaps because it makes prior, simple
> invariants more complicated?
>
It introduces new code and branches into all the various
insert/emplace/erase/splice functions, and of course there are minor
changes to constructors/swap/etc as well, to re-use or retain empty memory
blocks. As above, I need to do more vigorous cross-compiler testing to
establish the true effect of this - to find out whether what I've seen with
GCC9 is merely a bug in it's optimization.
>
> How does the container differ in the following cases:
> * Reservable design: Start empty, reserve(512) elements, insert 100
> elements
> * Non-reservable design: Start empty, insert 512 elements, erase 412
> elements
> * Reservable design: Start empty, insert 100 elements
> * Non-reservable design: Start empty, insert 100 elements
Basically as-noted, extra code and branching occuring. In case you're
interested, I've attached the before (reserve can only allocate one block)
and after (incomplete, reserve can allocate many blocks, erase can retain
memory blocks) variants to this email, may not be worth investigating
further until I've completed my investigation however.
8 and MSVC don't have the same 10% performance detriment as GCC9. I'm in
the middle of doing more cross-compiler testing to be thorough on this. I
suspect that the 2% performance difference is likely to hold true, but will
have to see. I would be interested to know what sort of % performance
threshold people would tolerate in order to have a working reserve function.
In the meantime I do one other question for people:
If we allow unused empty memory blocks (which is what reserve ensures),
that means we can retain memory blocks when all elements in them are
erased, rather than deallocating the blocks. This has numerous potential
performance advantages, for example if you have filled all element memory
blocks and then insert a new element, a new memory block will be created.
If that element is subsequently erased and then another inserted, on
repeat, without retaining the memory block you would end up deallocating
and reallocating a memory block repeatedly.
However it also throws up a lot of implementation-defined decision-making
as to the capacity of memory blocks which will be retained. For example, we
would almost always want to retain the back block as this will be the
largest - and similarly we might want to retain any other blocks which are
at the maximum memory block size. But if there's twenty memory blocks, and
the front memory block gets all it's elements erased, we wouldn't want to
retain it, as it's capacity would be small, which leads to worse cache
locality. Re-using that block would be a detriment to performance.
Also, retaining erased memory blocks leads to a vector-like scenario where
the user can have a lot of unused memory after erasures, until a
shrink-to-fit is called (I just remembered from the june talk that we
established that shrink-to-fit would be a non-element-reallocating "free
unused blocks" function much like std::deque, in this case). This has the
advantage of allowing the user to move a lot of deallocation to
non-hot-loop areas, but how much deallocation would occur in the hot loops
themselves would be dependent on the implementation as mentioned in the
previous paragraph.
Are either of these two aspects problematic, if we retain reserve?
Now, replying to Ben:
On Wed, 5 Aug 2020 at 09:23, Ben Craig via SG14 <sg14_at_[hidden]>
wrote:
> I believe your results, but I'm trying to understand the "why" a bit
> better. Does the addition of reserve() have knock-on effects on the rest
> of the design of the container? Perhaps because it makes prior, simple
> invariants more complicated?
>
It introduces new code and branches into all the various
insert/emplace/erase/splice functions, and of course there are minor
changes to constructors/swap/etc as well, to re-use or retain empty memory
blocks. As above, I need to do more vigorous cross-compiler testing to
establish the true effect of this - to find out whether what I've seen with
GCC9 is merely a bug in it's optimization.
>
> How does the container differ in the following cases:
> * Reservable design: Start empty, reserve(512) elements, insert 100
> elements
> * Non-reservable design: Start empty, insert 512 elements, erase 412
> elements
> * Reservable design: Start empty, insert 100 elements
> * Non-reservable design: Start empty, insert 100 elements
Basically as-noted, extra code and branching occuring. In case you're
interested, I've attached the before (reserve can only allocate one block)
and after (incomplete, reserve can allocate many blocks, erase can retain
memory blocks) variants to this email, may not be worth investigating
further until I've completed my investigation however.
Received on 2020-08-04 17:31:16