Hi, I have some related suggestions:
1. Was this considered as an alternative: have a "reserve(int)" function that adds a new block "manually" to the chain of blocks, with at least enough space for the required number of elements (not counting the previous elements or blocks).
I think it would be named differently from reserve, so maybe "reserve_block(int)".
That way:
 - it's basically what insert will do anyway when growing the container (with some constraints), it does not need to add conditions to the current implementations (I believe, I might be wrong);
 - we can achieve being able to pre-allocate outside of the hot-loops, in particular if we call it only once after creation.
 - it gives control to the users who need to be able to dynamically grow the container but also have the information allowing them to anticipate how to grow it (because reserve_block() can be called again if there is an incoming set of data that will need to be added).

2. If such reserve_block(int) function was provided, then would it be interesting to have a "shrink_to_last()" function that would only remove blocks at the end of the chain IFF they are empty. The last block which contains elements would be untouched.
That way:
  - implementers are free to keep the blocks allocated until that function is explicitly called, no way to end-up in the issue of deleting a block then recreating one because the user removed an element then added one;
  - It allows the user fine-enough control over the allocation to be used in (I believe) even extreme contexts.

I think these propositions would help achieve the goals .reserve() and .shrink_to_fit() achieve for other containers without the performance issues and while still making sense with the semantic of colony.
But I might be wrong, and I don't know if these ideas have been discussed already?

A. Joël Lamotte

On Wed, 5 Aug 2020 at 00:28, Matthew Bentley via SG14 <sg14@lists.isocpp.org> wrote:
Just to retroactively invalidate my previous statement, I've found that GCC 8 and MSVC don't have the same 10% performance detriment as GCC9. I'm in the middle of doing more cross-compiler testing to be thorough on this. I suspect that the 2% performance difference is likely to hold true, but will have to see. I would be interested to know what sort of % performance threshold people would tolerate in order to have a working reserve function.

In the meantime I do one other question for people:
If we allow unused empty memory blocks (which is what reserve ensures), that means we can retain memory blocks when all elements in them are erased, rather than deallocating the blocks. This has numerous potential performance advantages, for example if you have filled all element memory blocks and then insert a new element, a new memory block will be created. If that element is subsequently erased and then another inserted, on repeat, without retaining the memory block you would end up deallocating and reallocating a memory block repeatedly.

However it also throws up a lot of implementation-defined decision-making as to the capacity of memory blocks which will be retained. For example, we would almost always want to retain the back block as this will be the largest - and similarly we might want to retain any other blocks which are at the maximum memory block size. But if there's twenty memory blocks, and the front memory block gets all it's elements erased, we wouldn't want to retain it, as it's capacity would be small, which leads to worse cache locality. Re-using that block would be a detriment to performance.

Also, retaining erased memory blocks leads to a vector-like scenario where the user can have a lot of unused memory after erasures, until a shrink-to-fit is called (I just remembered from the june talk that we established that shrink-to-fit would be a non-element-reallocating "free unused blocks" function much like std::deque, in this case). This has the advantage of allowing the user to move a lot of deallocation to non-hot-loop areas, but how much deallocation would occur in the hot loops themselves would be dependent on the implementation as mentioned in the previous paragraph.

Are either of these two aspects problematic, if we retain reserve?


Now, replying to Ben:

On Wed, 5 Aug 2020 at 09:23, Ben Craig via SG14 <sg14@lists.isocpp.org> wrote:
I believe your results, but I'm trying to understand the "why" a bit better.  Does the addition of reserve() have knock-on effects on the rest of the design of the container?  Perhaps because it makes prior, simple invariants more complicated?

It introduces new code and branches into all the various insert/emplace/erase/splice functions, and of course there are minor changes to constructors/swap/etc as well, to re-use or retain empty memory blocks. As above, I need to do more vigorous cross-compiler testing to establish the true effect of this - to find out whether what I've seen with GCC9 is merely a bug in it's optimization.
 

How does the container differ in the following cases:
* Reservable design: Start empty, reserve(512) elements, insert 100 elements
* Non-reservable design: Start empty, insert 512 elements, erase 412 elements
* Reservable design: Start empty, insert 100 elements
* Non-reservable design: Start empty, insert 100 elements

Basically as-noted, extra code and branching occuring. In case you're interested, I've attached the before (reserve can only allocate one block) and after (incomplete, reserve can allocate many blocks, erase can retain memory blocks) variants to this email, may not be worth investigating further until I've completed my investigation however.
_______________________________________________
SG14 mailing list
SG14@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg14