Date: Fri, 7 Mar 2025 11:33:15 +1300
On 6/03/2025 11:32 am, Jens Maurer wrote:
> On 05/03/2025 23.06, Matt Bentley via SG14 wrote:
>> I'm just wondering what people's assumptions are around hive::assign()/fill-insert/range-insert and capacity. This is not a standardisation question.
>>
>> The standard ([sequence.reqmts]) does not specify one way or another whether capacity changes as a result of a call to assign(), but obviously if the assigned range is larger than capacity(), capacity() will increase.
>>
>> However, if the assigned range is smaller than capacity, would capacity be expected to decrease? Or rather, what would /you /expect to happen? Leave capacity as-is, avoid deallocations and put the burden on the developer to call trim()? This has implications for whether or not assign does the right thing by cache locality ie. maintaining the minimum number of larger contiguous blocks, or by allocation/deallocation.
> I can't parse this last sentence.
I had to parse my explanation a little before replying.
A non-pathological implementation for a general (ie. not embedded or
memory-sparse) platform would allocate small blocks first and have a
growth factor, if a user is inserting elements on an adhoc basis (and
the user has not defined set block sizes that are larger or called
reserve()). For this reason you can end up with a large hive with some
small blocks at the beginning which're not good for cache locality
during iteration.
If you're assigning and the smaller blocks are not needed to fill the
assign(num, element) quota, these could either be deallocated or
retained as reserved blocks. Ideally from a
cache-locality-during-iteration perspective, you would deallocate
because at this point in the container's expansion, it's clear that
we're using a large number of elements, so the smaller blocks aren't
necessary. They were necessary when the container was not certain how
many elements the user would need.
The performance tradeoff is an immediate deallocation cost vs a
long-term cache locality cost, is what I'm getting at.
The alternative of retaining them as reserved blocks and just using the
large ones where possible, then letting the user trim, is fine but also
not ideal because the user could also potentially deallocate many larger
reserved blocks when calling trim. Shrink_to_fit is possible but also
not ideal particularly for non-trivial types.
> For assign(), you first need to destroy all elements that are currently
> stored in the std::hive (iteration over the contents). That results in a
> bunch of empty blocks, which are then (partially) filled from the arguments
> to assign().
For trivial types this's the best way of doing the operation, as it
allows the above optimization of retaining only appropriately-large
blocks as active blocks.
However for non-trivial types you want to call operator = on the
existing elements in the hive, then insert the remainder (or erase
remainder if assign size < hive size). This is particularly important if
the type allocates.
Which leads me to another question.
For fill/range-insert, we don't guarantee for hive that placement of the
inserted elements within the hive is contiguous ie. to allow for re-use
of erased element locations.
My assumption has been that for assign, the same thing applies,
otherwise the above optimization for non-trivial types is complicated.
Would people expect that when you call hive::assign({1, 2, 3, 4, 5}) on
a non-empty hive, that those numbers would be in the supplied order when
iterating over the hive? Given that they do not expect that when calling
hive::insert({1, 2, 3, 4, 5}) on a non-empty hive.
Otherwise you end up having to insert a bunch of elements at the end of
the hive for non-trivial types, rather than filling up existing erased
element slots, which could potentially create an unnecessary allocation.
>> Similarly, would you expect a call to hive::insert(num, element) when size() == 0 but capacity() > num to potentially decrease capacity()?
> No, same rationale.
Okay, thanks.
> On 05/03/2025 23.06, Matt Bentley via SG14 wrote:
>> I'm just wondering what people's assumptions are around hive::assign()/fill-insert/range-insert and capacity. This is not a standardisation question.
>>
>> The standard ([sequence.reqmts]) does not specify one way or another whether capacity changes as a result of a call to assign(), but obviously if the assigned range is larger than capacity(), capacity() will increase.
>>
>> However, if the assigned range is smaller than capacity, would capacity be expected to decrease? Or rather, what would /you /expect to happen? Leave capacity as-is, avoid deallocations and put the burden on the developer to call trim()? This has implications for whether or not assign does the right thing by cache locality ie. maintaining the minimum number of larger contiguous blocks, or by allocation/deallocation.
> I can't parse this last sentence.
I had to parse my explanation a little before replying.
A non-pathological implementation for a general (ie. not embedded or
memory-sparse) platform would allocate small blocks first and have a
growth factor, if a user is inserting elements on an adhoc basis (and
the user has not defined set block sizes that are larger or called
reserve()). For this reason you can end up with a large hive with some
small blocks at the beginning which're not good for cache locality
during iteration.
If you're assigning and the smaller blocks are not needed to fill the
assign(num, element) quota, these could either be deallocated or
retained as reserved blocks. Ideally from a
cache-locality-during-iteration perspective, you would deallocate
because at this point in the container's expansion, it's clear that
we're using a large number of elements, so the smaller blocks aren't
necessary. They were necessary when the container was not certain how
many elements the user would need.
The performance tradeoff is an immediate deallocation cost vs a
long-term cache locality cost, is what I'm getting at.
The alternative of retaining them as reserved blocks and just using the
large ones where possible, then letting the user trim, is fine but also
not ideal because the user could also potentially deallocate many larger
reserved blocks when calling trim. Shrink_to_fit is possible but also
not ideal particularly for non-trivial types.
> For assign(), you first need to destroy all elements that are currently
> stored in the std::hive (iteration over the contents). That results in a
> bunch of empty blocks, which are then (partially) filled from the arguments
> to assign().
For trivial types this's the best way of doing the operation, as it
allows the above optimization of retaining only appropriately-large
blocks as active blocks.
However for non-trivial types you want to call operator = on the
existing elements in the hive, then insert the remainder (or erase
remainder if assign size < hive size). This is particularly important if
the type allocates.
Which leads me to another question.
For fill/range-insert, we don't guarantee for hive that placement of the
inserted elements within the hive is contiguous ie. to allow for re-use
of erased element locations.
My assumption has been that for assign, the same thing applies,
otherwise the above optimization for non-trivial types is complicated.
Would people expect that when you call hive::assign({1, 2, 3, 4, 5}) on
a non-empty hive, that those numbers would be in the supplied order when
iterating over the hive? Given that they do not expect that when calling
hive::insert({1, 2, 3, 4, 5}) on a non-empty hive.
Otherwise you end up having to insert a bunch of elements at the end of
the hive for non-trivial types, rather than filling up existing erased
element slots, which could potentially create an unnecessary allocation.
>> Similarly, would you expect a call to hive::insert(num, element) when size() == 0 but capacity() > num to potentially decrease capacity()?
> No, same rationale.
Okay, thanks.
Received on 2025-03-06 22:33:23