On 6/03/2025 11:32 am, Jens Maurer wrote:

On 05/03/2025 23.06, Matt Bentley via SG14 wrote:

I'm just wondering what people's assumptions are around hive::assign()/fill-insert/range-insert and capacity. This is not a standardisation question.

The standard ([sequence.reqmts]) does not specify one way or another whether capacity changes as a result of a call to assign(), but obviously if the assigned range is larger than capacity(), capacity() will increase.

However, if the assigned range is smaller than capacity, would capacity be expected to decrease? Or rather, what would /you /expect to happen? Leave capacity as-is, avoid deallocations and put the burden on the developer to call trim()? This has implications for whether or not assign does the right thing by cache locality ie. maintaining the minimum number of larger contiguous blocks, or by allocation/deallocation.

I can't parse this last sentence.

I had to parse my explanation a little before replying.

A non-pathological implementation for a general (ie. not embedded or memory-sparse) platform would allocate small blocks first and have a growth factor, if a user is inserting elements on an adhoc basis (and the user has not defined set block sizes that are larger or called reserve()). For this reason you can end up with a large hive with some small blocks at the beginning which're not good for cache locality during iteration.

If you're assigning and the smaller blocks are not needed to fill the assign(num, element) quota, these could either be deallocated or retained as reserved blocks. Ideally from a cache-locality-during-iteration perspective, you would deallocate because at this point in the container's expansion, it's clear that we're using a large number of elements, so the smaller blocks aren't necessary. They were necessary when the container was not certain how many elements the user would need.

The performance tradeoff is an immediate deallocation cost vs a long-term cache locality cost, is what I'm getting at.

The alternative of retaining them as reserved blocks and just using the large ones where possible, then letting the user trim, is fine but also not ideal because the user could also potentially deallocate many larger reserved blocks when calling trim. Shrink_to_fit is possible but also not ideal particularly for non-trivial types.

For assign(), you first need to destroy all elements that are currently
stored in the std::hive (iteration over the contents).  That results in a
bunch of empty blocks, which are then (partially) filled from the arguments
to assign().

For trivial types this's the best way of doing the operation, as it allows the above optimization of retaining only appropriately-large blocks as active blocks.

However for non-trivial types you want to call operator = on the existing elements in the hive, then insert the remainder (or erase remainder if assign size < hive size). This is particularly important if the type allocates.

Which leads me to another question.

For fill/range-insert, we don't guarantee for hive that placement of the inserted elements within the hive is contiguous ie. to allow for re-use of erased element locations.

My assumption has been that for assign, the same thing applies, otherwise the above optimization for non-trivial types is complicated.

Would people expect that when you call hive::assign({1, 2, 3, 4, 5}) on a non-empty hive, that those numbers would be in the supplied order when iterating over the hive? Given that they do not expect that when calling hive::insert({1, 2, 3, 4, 5}) on a non-empty hive.

Otherwise you end up having to insert a bunch of elements at the end of the hive for non-trivial types, rather than filling up existing erased element slots, which could potentially create an unnecessary allocation.

Similarly, would you expect a call to hive::insert(num, element) when size() == 0 but capacity() > num to potentially decrease capacity()?

No, same rationale.

Okay, thanks.