Date: Thu, 23 Jun 2022 14:32:54 -0400
Hi Michael,
It sounds like Bryce doesn't want July 13th to be the "official" LEWG
discussion of P0447+P2596. Still, FWIW, I should be able to attend the
July 13th SG14 meeting (at either 11am or 8pm NYC time), and I certainly
won't mind if SG14 has nothing better to do than talk about P0447 and/or
P2596. The more people talk about `hive` the better, IMHO.
As always, if anyone in SG14 or elsewhere has implementation experience
with a recent version of P0447, or has used any equivalent of
`hive_limits`/`reshape` in the wild, I'd be very interested in their report.
–Arthur
On Wed, Jun 22, 2022 at 11:25 PM Michael Wong <fraggamuffin_at_[hidden]>
wrote:
> Hi Matthew and Arthur, without stepping on LEWG's domain (and LEWG please
> let me know to back off if I am), I think we all want this proposal, and
> none more than SG14 but it must fit LEWG/LWG conventions. I would like to
> offer my help to bring this to an agreement which I am convinced is still
> possible. I don't pretend to understand all the issues at play here because
> there may be misunderstandings, cross communications, missed notes, and
> unassumed intentions. Perhaps we can start with a fresh slate and believe
> everyone is coming at this with an honest intention and forget all that and
> aim for a common technical ground.
> If you guys like someone to help mediate, I am willing to offer myself if
> you both agree to it. I also want one of the LEWG chairs, perhaps Ben Craig
> who started this thread (or others) to help preside as well as the two
> agreed proxies for this paper: Guy Davidson and Patrice Roy (if I recall
> correctly from SG14 and they also agree).
>
> If everyone agrees and it is OK by Ben and LEWG, we can schedule this at
> the next scheduled SG14 call combined with LEWG on July 13.I know the
> normal SG14 time will be bad for NZ time. So we could try to host that SG14
> call at 8-10 pm ET, which I believe is 12noon-2pm NZ. SG14 has done this
> before with an inverted meeting that is friendlier to Asia/Pacific. With
> enough advertisement, there will be enough folks attending combined from
> both SG14 and LEWG.
>
> At this stage, this proposal is aiming for C++26 anyway, so I think 3 more
> weeks won't make a huge difference but it could increase its chance for
> smooth passing for C++26 which I think is all our aim here especially given
> this is one of the most requested features for Games. Of course, if you
> guys can work it out in these three weeks and progress in LEWG and don't
> need my service, I am also happy for that.
>
> Please let me know if this proposal works for you?
>
> On Wed, Jun 22, 2022 at 11:27 AM Arthur O'Dwyer via Lib-Ext <
> lib-ext_at_[hidden]> wrote:
>
>> On Tue, Jun 21, 2022 at 8:12 PM Matt Bentley <mattreecebentley_at_[hidden]>
>> wrote:
>>
>>> Hi Arthur- is there a reason why you ignored all the feedback I gave you
>>> prior to writing the paper?
>>>
>>
>> Hmm; if you sent a reply to my email of 2022-05-27 (copied below), I
>> never saw it. As I said on 2022-06-14, my impression was that I sent you a
>> draft on 2022-05-27 and (as far as I knew) you never replied at all. If
>> something went wrong and you actually sent feedback that got lost in the
>> mail, my abject apologies.
>>
>> I first mentioned my intention w.r.t. `reshape` on 2022-05-03 in thread
>> titled
>> Re: [isocpp-lib-ext] Notes on P0447R20, was Re: Info+paper for hive
>> discussion Tuesday 5th 11am pacific time
>> "... I'll write this up in a paper if I need to. ..."
>> That is, I was already well aware that my proposed API changes were not
>> likely to be adopted as "friendly amendments" but would actually require a
>> paper to actively discuss the pros and cons.
>> You did respond (on 2022-05-03) to that message with *comments* including
>> "...I think I'm about done with this. ... No it's not. ... No. ... You
>> are technically correct. ... Don't spam me Arthur. ..."
>> I didn't perceive that as "*feedback*" per se, more "just the automatic
>> gainsaying of anything the other person says," and there certainly wasn't
>> anything in your response specifically engaging with my `reshape` stuff.
>> So, my intention to propose (major API, LEWG-targeting, paper-requiring)
>> changes to `reshape` was unaffected by those comments.
>>
>> I also mentioned my intention, again repeating the concrete wording
>> proposal, on 2022-05-16 in thread titled
>> [isocpp-lib-ext] P0447(std::hive): splice() and specified behaviour for
>> reserved (unused) memory blocks, trim_capacity(n) overload
>> "...My proposal in this area (which I'll turn into a paper Real Soon
>> Now)..."
>>
>> Finally, I sent a draft of the paper to you personally on 2022-05-27 and
>> then (having received no reply) submitted it to the mailing on 2022-06-10.
>>
>> I appreciate that you probably see this paper as an encroachment on your
>> personal domain of expertise, as a speedbump preventing `hive` from leaving
>> LEWG at the eleventh hour, etc etc. But the question for LEWG is simply
>> whether they want `hive` to leave LEWG in exactly its current state, or
>> whether the criticisms in P2596 have enough merit to justify (further)
>> modifying the API that is to be standardized for `std::hive`. If there
>> *are* to be (further) API changes, LEWG is the right venue to discuss
>> them.
>>
>> I repeat: I think it would be good to hear what STL vendors and/or LWG
>> luminaries think about the current API. If anyone other than Matt and
>> myself has really dug into the implementation, I haven't heard about it.
>> It is *quite possible* that this has happened without my hearing about
>> it.
>>
>> –Arthur
>>
>>
>>
>> On 14/06/2022 4:58 am, Arthur O'Dwyer wrote:
>>> > I'm sorry that Matt feels personally attacked here. I really just
>>> think
>>> > [insert everything I say in P2596 Section 3.1 through 3.6
>>> > <https://isocpp.org/files/papers/P2596R0.html#motivation>], and for
>>> > those reasons I propose a simplification of std::hive's public API to
>>> > bring it more into line with the rest of the STL and also give vendors
>>> > the ability to make it more performant.
>>> >
>>> > FWIW, I did email a draft of P2596 to Matt Bentley via personal email
>>> on
>>> > 2022-05-27, asking for his feedback; Matt never replied.
>>> >
>>> > I think it would be good to hear what STL vendors and/or LWG
>>> luminaries
>>> > think about the current API. If anyone other than Matt and myself has
>>> > really dug into the implementation, I haven't heard about it. It is
>>> > /quite possible/ that this has happened without my hearing about it.
>>> >
>>> > –Arthur
>>> >
>>> >
>>> > ---------- Forwarded message ---------
>>> > From: *Arthur O'Dwyer* <arthur.j.odwyer_at_[hidden]
>>> > <mailto:arthur.j.odwyer_at_[hidden]>>
>>> > Date: Fri, May 27, 2022 at 9:17 AM
>>> > Subject: Proposal for changes to std::hive::reshape
>>> > To: Matthew Bentley <mattreecebentley_at_[hidden]
>>> > <mailto:mattreecebentley_at_[hidden]>>
>>> >
>>> >
>>> > Hi Matt,
>>> >
>>> > I've got a first draft of my proposed changes to `hive::reshape` now!
>>> > I haven't distributed this to anyone else yet. Please let me know your
>>> > thoughts.
>>> > https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html
>>> > <https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html>
>>> >
>>> > Thanks,
>>> > Arthur
>>> >
>>> > On Mon, Jun 13, 2022 at 2:13 AM Matt Bentley <
>>> mattreecebentley_at_[hidden]
>>> > <mailto:mattreecebentley_at_[hidden]>> wrote:
>>> >
>>> > As mentioned by Ben the immovable objects problem is unavoidable,
>>> and
>>> > this is explained in the FAQ of the current hive paper under "Why
>>> are
>>> > hive_limits specified in constructors and not relegated to a
>>> secondary
>>> > function" (https://isocpp.org/files/papers/P0447R20.html
>>> > <https://isocpp.org/files/papers/P0447R20.html>). Arthur's
>>> > workaround is not suitable as a generalised solution to the
>>> problem.
>>> >
>>> > There is nothing true or of import in P2596. To go into detail
>>> would
>>> > take too long, so I will address the main bullet points of each
>>> > section.
>>> > Assume anything I haven't addressed as false or too trivial to
>>> discuss.
>>> >
>>> >
>>> > 1. The central premise is wrong - you do not get more performance
>>> for
>>> > more locality beyond a certain point, due to cache limits and
>>> > cache-line
>>> > limits. After that point the extra performance comes from fewer
>>> > allocations/deallocations for the larger chunk sizes, and the
>>> number of
>>> > block transitions, not locality so much.
>>> > If that weren't the case, libstdc++'s deque would have
>>> substantially
>>> > slower iteration than vector for types that are smaller than it's
>>> chunk
>>> > size. See benchmarks here:
>>> > https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance
>>> > <https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance>
>>> >
>>> > In terms of compilers and predicting cache/cache-line sizes, they
>>> > cannot
>>> > effectively do this for every platform, and certainly not for
>>> embedded,
>>> > so the facility for implementors to specify their own chunk
>>> capacities
>>> > is crucial. Larger block sizes also may increase memory
>>> fragmentation,
>>> > though this is platform dependent.
>>> >
>>> >
>>> > In terms of colony/hive itself, there are additional
>>> considerations.
>>> > It was found that the performance for general use and referencer
>>> > benchmarking (start here:
>>> > https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification
>>> > <https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification>)
>>> was
>>> > unchanged when lowering default block capacity limits for hive to
>>> 8192,
>>> > while vastly improving memory use. This is documented on the
>>> reflector
>>> > in the discussion on the priority parameter. 8192 is now the
>>> default
>>> > block size for non-scalar types in the reference implementation.
>>> >
>>> >
>>> > This result is largely because of the following factor described
>>> in FAQ
>>> > item 11.c:
>>> > "If a block size is large and many erasures have occurred but the
>>> block
>>> > is not completely empty of elements, in addition to having a lot of
>>> > wasted unused memory, iterative performance will suffer due to the
>>> > large
>>> > memory gaps between any two non-erased elements and subsequent
>>> drops in
>>> > data locality and cache performance. In this scenario the user
>>> would
>>> > desire to experiment with benchmarking and limiting the
>>> minimum/maximum
>>> > capacities of the blocks, such that memory blocks are freed earlier
>>> > given the user's specific erasure patterns, and performance is
>>> > improved."
>>> >
>>> > In other words, in real-world usage, having your block capacities
>>> too
>>> > large results in less locality.
>>> >
>>> > I ran a few benchmarks over the weekend, and they confirmed this
>>> result.
>>> > When testing single-insertion/erasure/iteration in isolation,
>>> moving to
>>> > block capacities of 65535 resulted in better insertion performance
>>> but
>>> > unchanged iteration/erasure, while changing the skipfield type to
>>> uint
>>> > and max block capacity to uintmax resulted in better insertion but
>>> > worse
>>> > iteration performance. But in more real-world benchmarks
>>> (modification
>>> > and referencer, where elements are inserted, erased and iterated
>>> upon
>>> > over time on the same containers), moving to 65535 only improved
>>> > performance by 1% while increasing memory usage by 8%, while
>>> moving to
>>> > uint_max decreased performance by 1% while increasing memory use by
>>> > 16%.
>>> > This was on average across sets ranging from 10 to 1 million
>>> elements.
>>> >
>>> > In addition, the current implementation changes the skipfield type
>>> to
>>> > uint_least8 for scalar types, based again on the results described
>>> in
>>> > the priority parameter discussion on the reflector. This limits the
>>> > block capacities for scalar types to 255. It is an
>>> > implementation-specific decision, but will demonstrate the point
>>> > further. So, I ran low-and-high real-world modification scenario
>>> > benchmarking on type int, using uint_least8 on one run,
>>> uint_least16 on
>>> > the second. On average in sets ranging from 10 to 1 million
>>> elements
>>> > moving to uint_least16 improved performance by 2%, but used 44%
>>> more
>>> > memory.
>>> > (
>>> https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip
>>> > <
>>> https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip>)
>>> >
>>> >
>>> > Other reasons users need this:
>>> > (a) When they know their erasure patterns are erasing X number of
>>> > elements at once (so they can minimize jumps when iterating).
>>> > (b) To avoid innappropriate amounts of allocation upon expansion
>>> (as
>>> > mentioned by Ben), particularly on low-memory platforms like
>>> embedded.
>>> > (c) To allow them to choose their own tradeoffs between memory use
>>> and
>>> > performance.
>>> >
>>> >
>>> >
>>> > 2. "Max block sizes cause O(N) behaviour"
>>> > Even if time complexity was always a major factor in performance
>>> (it is
>>> > sometimes), this is false.
>>> >
>>> > For starters, Arthur is talking about O(N) behaviour with memory
>>> > blocks,
>>> > not elements, which is a very different story. The two examples
>>> > provided
>>> > where you would get improved time complexity from not limiting the
>>> > block
>>> > sizes, were only reflected in performance, statistically-speaking,
>>> if
>>> > the max block capacities were very low (~255) and the number of
>>> blocks
>>> > very high. Based on this, I had already worked around those two
>>> > areas by
>>> > the time I read the paper.
>>> >
>>> > In terms of the first example Arthur gives (block number
>>> re-writing),
>>> > Arthur states block number re-numbering would happen 'less often'.
>>> No.
>>> > It will happen 1 in every std::numeric_limits<size_t>::max()
>>> > allocations
>>> > or transfers from reserved blocks to active blocks ie. actually
>>> > never in
>>> > the lifetime of most programs. And it does not occur upon erasure.
>>> > When it does happen, there is no guarantee that the number of
>>> blocks
>>> > will be large. And it's not an expensive operation. Therefore the
>>> time
>>> > complexity for those operations involving this op is O(1)
>>> amortized.
>>> > This is clear in both the code and in the commit notes, so Arthur
>>> is
>>> > aware of this given that he is aware that there is a change which
>>> makes
>>> > the re-numbering less frequent. This is a deliberate falsehood.
>>> >
>>> > In addition, if a hive were implemented as a vector of pointers to
>>> > groups, rather than a linked list of groups, this would also
>>> > necessitate
>>> > amortized time complexity in this scenario, as when the vector
>>> became
>>> > full, and allocation of a new block needed to occur, all group
>>> pointers
>>> > would need to be reallocated to a new memory block.
>>> >
>>> > In terms of the second example provided by Arthur, that issue no
>>> longer
>>> > exists (the singly-linked list was changed to doubly-linked prior
>>> to
>>> > receiving the paper, avoiding the problem). And it wasn't an
>>> expensive
>>> > operation, except in the scenario described above.
>>> >
>>> >
>>> >
>>> > 3. splice() is a tertiary, and expected to be lesser-used
>>> function, but
>>> > it's primary feature is transferring elements from one hive to
>>> another
>>> > without invalidating iterators pointing to those elements, not it's
>>> > time
>>> > complexity. In terms of transference of elements, it is O(1). In
>>> terms
>>> > of transference of element memory blocks, the hive paper states it
>>> > as at
>>> > worst, O(n) in the number of blocks in the container, and that is
>>> what
>>> > it is. At best, it is O(1). You can check the code on this
>>> > (https://github.com/mattreecebentley/plf_hive
>>> > <https://github.com/mattreecebentley/plf_hive>). But at any rate,
>>> the
>>> > operation is not an expensive one.
>>> >
>>> > It is a requirement that splice check whether the min/max
>>> capacities of
>>> > the source hive are within the destination's (if so, then O(1)), if
>>> > they
>>> > aren't then every block needs it's capacity checked (O(N)).
>>> > Arthur's "fix" doesn't change anything. You would still have to
>>> check
>>> > every block's capacity if the minimum block capacity of the source
>>> were
>>> > lower than the destination, time complexity is unchanged.
>>> >
>>> > Even if you were to somehow remove this potential O(N) behaviour,
>>> you
>>> > would still have the same time complexity if the hive were
>>> implemented
>>> > as a vector of pointers to blocks, rather than a linked list of
>>> blocks,
>>> > as splice would need to copy each pointer from the source vector
>>> to the
>>> > destination. So it would need to be specified the same in the tech
>>> spec.
>>> >
>>> >
>>> >
>>> > 4. Regards "bloating the interface", the additional constructor
>>> > overloads that're necessary are all implementable as delegating
>>> > constructors, and this is how they are done in the reference
>>> > implementation. I could've specified this in the tech spec but
>>> there
>>> > was
>>> > no need, I considered it better to allow implementors freedom of
>>> > implementation. There is no real-world waste.
>>> >
>>> >
>>> > 5. UB - the block_capacity_hard_limits() function was specifically
>>> > introduced so that the user can avoid this issue, hence it is a
>>> > non-issue. Arthur is aware of this.
>>> >
>>> >
>>> > I move to dismiss this paper with prejudice. It is a large waste of
>>> > committee time, and far from displaying an above-average
>>> understanding
>>> > of the container in question, shows a low understanding. Enough
>>> time
>>> > has
>>> > already been wasted. Let's move on. The larger concern is that this
>>> > paper should serve the community, and in terms of who I've been in
>>> > contact with, and who has contacted me, they have overwhelmingly
>>> > required the current behaviour. I find it disturbing that Arthur
>>> has
>>> > ignored both responses to his enquiries on this subject in the
>>> > reflector
>>> > and in the hive paper itself, and has abused the system to draw
>>> more
>>> > attention without giving due responses. At any rate, I do not
>>> intend to
>>> > personally waste significantly more time on this.
>>> > Cheers,
>>> > Matt
>>> >
>>>
>> _______________________________________________
>> Lib-Ext mailing list
>> Lib-Ext_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext
>> Link to this post: http://lists.isocpp.org/lib-ext/2022/06/23519.php
>>
>
It sounds like Bryce doesn't want July 13th to be the "official" LEWG
discussion of P0447+P2596. Still, FWIW, I should be able to attend the
July 13th SG14 meeting (at either 11am or 8pm NYC time), and I certainly
won't mind if SG14 has nothing better to do than talk about P0447 and/or
P2596. The more people talk about `hive` the better, IMHO.
As always, if anyone in SG14 or elsewhere has implementation experience
with a recent version of P0447, or has used any equivalent of
`hive_limits`/`reshape` in the wild, I'd be very interested in their report.
–Arthur
On Wed, Jun 22, 2022 at 11:25 PM Michael Wong <fraggamuffin_at_[hidden]>
wrote:
> Hi Matthew and Arthur, without stepping on LEWG's domain (and LEWG please
> let me know to back off if I am), I think we all want this proposal, and
> none more than SG14 but it must fit LEWG/LWG conventions. I would like to
> offer my help to bring this to an agreement which I am convinced is still
> possible. I don't pretend to understand all the issues at play here because
> there may be misunderstandings, cross communications, missed notes, and
> unassumed intentions. Perhaps we can start with a fresh slate and believe
> everyone is coming at this with an honest intention and forget all that and
> aim for a common technical ground.
> If you guys like someone to help mediate, I am willing to offer myself if
> you both agree to it. I also want one of the LEWG chairs, perhaps Ben Craig
> who started this thread (or others) to help preside as well as the two
> agreed proxies for this paper: Guy Davidson and Patrice Roy (if I recall
> correctly from SG14 and they also agree).
>
> If everyone agrees and it is OK by Ben and LEWG, we can schedule this at
> the next scheduled SG14 call combined with LEWG on July 13.I know the
> normal SG14 time will be bad for NZ time. So we could try to host that SG14
> call at 8-10 pm ET, which I believe is 12noon-2pm NZ. SG14 has done this
> before with an inverted meeting that is friendlier to Asia/Pacific. With
> enough advertisement, there will be enough folks attending combined from
> both SG14 and LEWG.
>
> At this stage, this proposal is aiming for C++26 anyway, so I think 3 more
> weeks won't make a huge difference but it could increase its chance for
> smooth passing for C++26 which I think is all our aim here especially given
> this is one of the most requested features for Games. Of course, if you
> guys can work it out in these three weeks and progress in LEWG and don't
> need my service, I am also happy for that.
>
> Please let me know if this proposal works for you?
>
> On Wed, Jun 22, 2022 at 11:27 AM Arthur O'Dwyer via Lib-Ext <
> lib-ext_at_[hidden]> wrote:
>
>> On Tue, Jun 21, 2022 at 8:12 PM Matt Bentley <mattreecebentley_at_[hidden]>
>> wrote:
>>
>>> Hi Arthur- is there a reason why you ignored all the feedback I gave you
>>> prior to writing the paper?
>>>
>>
>> Hmm; if you sent a reply to my email of 2022-05-27 (copied below), I
>> never saw it. As I said on 2022-06-14, my impression was that I sent you a
>> draft on 2022-05-27 and (as far as I knew) you never replied at all. If
>> something went wrong and you actually sent feedback that got lost in the
>> mail, my abject apologies.
>>
>> I first mentioned my intention w.r.t. `reshape` on 2022-05-03 in thread
>> titled
>> Re: [isocpp-lib-ext] Notes on P0447R20, was Re: Info+paper for hive
>> discussion Tuesday 5th 11am pacific time
>> "... I'll write this up in a paper if I need to. ..."
>> That is, I was already well aware that my proposed API changes were not
>> likely to be adopted as "friendly amendments" but would actually require a
>> paper to actively discuss the pros and cons.
>> You did respond (on 2022-05-03) to that message with *comments* including
>> "...I think I'm about done with this. ... No it's not. ... No. ... You
>> are technically correct. ... Don't spam me Arthur. ..."
>> I didn't perceive that as "*feedback*" per se, more "just the automatic
>> gainsaying of anything the other person says," and there certainly wasn't
>> anything in your response specifically engaging with my `reshape` stuff.
>> So, my intention to propose (major API, LEWG-targeting, paper-requiring)
>> changes to `reshape` was unaffected by those comments.
>>
>> I also mentioned my intention, again repeating the concrete wording
>> proposal, on 2022-05-16 in thread titled
>> [isocpp-lib-ext] P0447(std::hive): splice() and specified behaviour for
>> reserved (unused) memory blocks, trim_capacity(n) overload
>> "...My proposal in this area (which I'll turn into a paper Real Soon
>> Now)..."
>>
>> Finally, I sent a draft of the paper to you personally on 2022-05-27 and
>> then (having received no reply) submitted it to the mailing on 2022-06-10.
>>
>> I appreciate that you probably see this paper as an encroachment on your
>> personal domain of expertise, as a speedbump preventing `hive` from leaving
>> LEWG at the eleventh hour, etc etc. But the question for LEWG is simply
>> whether they want `hive` to leave LEWG in exactly its current state, or
>> whether the criticisms in P2596 have enough merit to justify (further)
>> modifying the API that is to be standardized for `std::hive`. If there
>> *are* to be (further) API changes, LEWG is the right venue to discuss
>> them.
>>
>> I repeat: I think it would be good to hear what STL vendors and/or LWG
>> luminaries think about the current API. If anyone other than Matt and
>> myself has really dug into the implementation, I haven't heard about it.
>> It is *quite possible* that this has happened without my hearing about
>> it.
>>
>> –Arthur
>>
>>
>>
>> On 14/06/2022 4:58 am, Arthur O'Dwyer wrote:
>>> > I'm sorry that Matt feels personally attacked here. I really just
>>> think
>>> > [insert everything I say in P2596 Section 3.1 through 3.6
>>> > <https://isocpp.org/files/papers/P2596R0.html#motivation>], and for
>>> > those reasons I propose a simplification of std::hive's public API to
>>> > bring it more into line with the rest of the STL and also give vendors
>>> > the ability to make it more performant.
>>> >
>>> > FWIW, I did email a draft of P2596 to Matt Bentley via personal email
>>> on
>>> > 2022-05-27, asking for his feedback; Matt never replied.
>>> >
>>> > I think it would be good to hear what STL vendors and/or LWG
>>> luminaries
>>> > think about the current API. If anyone other than Matt and myself has
>>> > really dug into the implementation, I haven't heard about it. It is
>>> > /quite possible/ that this has happened without my hearing about it.
>>> >
>>> > –Arthur
>>> >
>>> >
>>> > ---------- Forwarded message ---------
>>> > From: *Arthur O'Dwyer* <arthur.j.odwyer_at_[hidden]
>>> > <mailto:arthur.j.odwyer_at_[hidden]>>
>>> > Date: Fri, May 27, 2022 at 9:17 AM
>>> > Subject: Proposal for changes to std::hive::reshape
>>> > To: Matthew Bentley <mattreecebentley_at_[hidden]
>>> > <mailto:mattreecebentley_at_[hidden]>>
>>> >
>>> >
>>> > Hi Matt,
>>> >
>>> > I've got a first draft of my proposed changes to `hive::reshape` now!
>>> > I haven't distributed this to anyone else yet. Please let me know your
>>> > thoughts.
>>> > https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html
>>> > <https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html>
>>> >
>>> > Thanks,
>>> > Arthur
>>> >
>>> > On Mon, Jun 13, 2022 at 2:13 AM Matt Bentley <
>>> mattreecebentley_at_[hidden]
>>> > <mailto:mattreecebentley_at_[hidden]>> wrote:
>>> >
>>> > As mentioned by Ben the immovable objects problem is unavoidable,
>>> and
>>> > this is explained in the FAQ of the current hive paper under "Why
>>> are
>>> > hive_limits specified in constructors and not relegated to a
>>> secondary
>>> > function" (https://isocpp.org/files/papers/P0447R20.html
>>> > <https://isocpp.org/files/papers/P0447R20.html>). Arthur's
>>> > workaround is not suitable as a generalised solution to the
>>> problem.
>>> >
>>> > There is nothing true or of import in P2596. To go into detail
>>> would
>>> > take too long, so I will address the main bullet points of each
>>> > section.
>>> > Assume anything I haven't addressed as false or too trivial to
>>> discuss.
>>> >
>>> >
>>> > 1. The central premise is wrong - you do not get more performance
>>> for
>>> > more locality beyond a certain point, due to cache limits and
>>> > cache-line
>>> > limits. After that point the extra performance comes from fewer
>>> > allocations/deallocations for the larger chunk sizes, and the
>>> number of
>>> > block transitions, not locality so much.
>>> > If that weren't the case, libstdc++'s deque would have
>>> substantially
>>> > slower iteration than vector for types that are smaller than it's
>>> chunk
>>> > size. See benchmarks here:
>>> > https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance
>>> > <https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance>
>>> >
>>> > In terms of compilers and predicting cache/cache-line sizes, they
>>> > cannot
>>> > effectively do this for every platform, and certainly not for
>>> embedded,
>>> > so the facility for implementors to specify their own chunk
>>> capacities
>>> > is crucial. Larger block sizes also may increase memory
>>> fragmentation,
>>> > though this is platform dependent.
>>> >
>>> >
>>> > In terms of colony/hive itself, there are additional
>>> considerations.
>>> > It was found that the performance for general use and referencer
>>> > benchmarking (start here:
>>> > https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification
>>> > <https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification>)
>>> was
>>> > unchanged when lowering default block capacity limits for hive to
>>> 8192,
>>> > while vastly improving memory use. This is documented on the
>>> reflector
>>> > in the discussion on the priority parameter. 8192 is now the
>>> default
>>> > block size for non-scalar types in the reference implementation.
>>> >
>>> >
>>> > This result is largely because of the following factor described
>>> in FAQ
>>> > item 11.c:
>>> > "If a block size is large and many erasures have occurred but the
>>> block
>>> > is not completely empty of elements, in addition to having a lot of
>>> > wasted unused memory, iterative performance will suffer due to the
>>> > large
>>> > memory gaps between any two non-erased elements and subsequent
>>> drops in
>>> > data locality and cache performance. In this scenario the user
>>> would
>>> > desire to experiment with benchmarking and limiting the
>>> minimum/maximum
>>> > capacities of the blocks, such that memory blocks are freed earlier
>>> > given the user's specific erasure patterns, and performance is
>>> > improved."
>>> >
>>> > In other words, in real-world usage, having your block capacities
>>> too
>>> > large results in less locality.
>>> >
>>> > I ran a few benchmarks over the weekend, and they confirmed this
>>> result.
>>> > When testing single-insertion/erasure/iteration in isolation,
>>> moving to
>>> > block capacities of 65535 resulted in better insertion performance
>>> but
>>> > unchanged iteration/erasure, while changing the skipfield type to
>>> uint
>>> > and max block capacity to uintmax resulted in better insertion but
>>> > worse
>>> > iteration performance. But in more real-world benchmarks
>>> (modification
>>> > and referencer, where elements are inserted, erased and iterated
>>> upon
>>> > over time on the same containers), moving to 65535 only improved
>>> > performance by 1% while increasing memory usage by 8%, while
>>> moving to
>>> > uint_max decreased performance by 1% while increasing memory use by
>>> > 16%.
>>> > This was on average across sets ranging from 10 to 1 million
>>> elements.
>>> >
>>> > In addition, the current implementation changes the skipfield type
>>> to
>>> > uint_least8 for scalar types, based again on the results described
>>> in
>>> > the priority parameter discussion on the reflector. This limits the
>>> > block capacities for scalar types to 255. It is an
>>> > implementation-specific decision, but will demonstrate the point
>>> > further. So, I ran low-and-high real-world modification scenario
>>> > benchmarking on type int, using uint_least8 on one run,
>>> uint_least16 on
>>> > the second. On average in sets ranging from 10 to 1 million
>>> elements
>>> > moving to uint_least16 improved performance by 2%, but used 44%
>>> more
>>> > memory.
>>> > (
>>> https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip
>>> > <
>>> https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip>)
>>> >
>>> >
>>> > Other reasons users need this:
>>> > (a) When they know their erasure patterns are erasing X number of
>>> > elements at once (so they can minimize jumps when iterating).
>>> > (b) To avoid innappropriate amounts of allocation upon expansion
>>> (as
>>> > mentioned by Ben), particularly on low-memory platforms like
>>> embedded.
>>> > (c) To allow them to choose their own tradeoffs between memory use
>>> and
>>> > performance.
>>> >
>>> >
>>> >
>>> > 2. "Max block sizes cause O(N) behaviour"
>>> > Even if time complexity was always a major factor in performance
>>> (it is
>>> > sometimes), this is false.
>>> >
>>> > For starters, Arthur is talking about O(N) behaviour with memory
>>> > blocks,
>>> > not elements, which is a very different story. The two examples
>>> > provided
>>> > where you would get improved time complexity from not limiting the
>>> > block
>>> > sizes, were only reflected in performance, statistically-speaking,
>>> if
>>> > the max block capacities were very low (~255) and the number of
>>> blocks
>>> > very high. Based on this, I had already worked around those two
>>> > areas by
>>> > the time I read the paper.
>>> >
>>> > In terms of the first example Arthur gives (block number
>>> re-writing),
>>> > Arthur states block number re-numbering would happen 'less often'.
>>> No.
>>> > It will happen 1 in every std::numeric_limits<size_t>::max()
>>> > allocations
>>> > or transfers from reserved blocks to active blocks ie. actually
>>> > never in
>>> > the lifetime of most programs. And it does not occur upon erasure.
>>> > When it does happen, there is no guarantee that the number of
>>> blocks
>>> > will be large. And it's not an expensive operation. Therefore the
>>> time
>>> > complexity for those operations involving this op is O(1)
>>> amortized.
>>> > This is clear in both the code and in the commit notes, so Arthur
>>> is
>>> > aware of this given that he is aware that there is a change which
>>> makes
>>> > the re-numbering less frequent. This is a deliberate falsehood.
>>> >
>>> > In addition, if a hive were implemented as a vector of pointers to
>>> > groups, rather than a linked list of groups, this would also
>>> > necessitate
>>> > amortized time complexity in this scenario, as when the vector
>>> became
>>> > full, and allocation of a new block needed to occur, all group
>>> pointers
>>> > would need to be reallocated to a new memory block.
>>> >
>>> > In terms of the second example provided by Arthur, that issue no
>>> longer
>>> > exists (the singly-linked list was changed to doubly-linked prior
>>> to
>>> > receiving the paper, avoiding the problem). And it wasn't an
>>> expensive
>>> > operation, except in the scenario described above.
>>> >
>>> >
>>> >
>>> > 3. splice() is a tertiary, and expected to be lesser-used
>>> function, but
>>> > it's primary feature is transferring elements from one hive to
>>> another
>>> > without invalidating iterators pointing to those elements, not it's
>>> > time
>>> > complexity. In terms of transference of elements, it is O(1). In
>>> terms
>>> > of transference of element memory blocks, the hive paper states it
>>> > as at
>>> > worst, O(n) in the number of blocks in the container, and that is
>>> what
>>> > it is. At best, it is O(1). You can check the code on this
>>> > (https://github.com/mattreecebentley/plf_hive
>>> > <https://github.com/mattreecebentley/plf_hive>). But at any rate,
>>> the
>>> > operation is not an expensive one.
>>> >
>>> > It is a requirement that splice check whether the min/max
>>> capacities of
>>> > the source hive are within the destination's (if so, then O(1)), if
>>> > they
>>> > aren't then every block needs it's capacity checked (O(N)).
>>> > Arthur's "fix" doesn't change anything. You would still have to
>>> check
>>> > every block's capacity if the minimum block capacity of the source
>>> were
>>> > lower than the destination, time complexity is unchanged.
>>> >
>>> > Even if you were to somehow remove this potential O(N) behaviour,
>>> you
>>> > would still have the same time complexity if the hive were
>>> implemented
>>> > as a vector of pointers to blocks, rather than a linked list of
>>> blocks,
>>> > as splice would need to copy each pointer from the source vector
>>> to the
>>> > destination. So it would need to be specified the same in the tech
>>> spec.
>>> >
>>> >
>>> >
>>> > 4. Regards "bloating the interface", the additional constructor
>>> > overloads that're necessary are all implementable as delegating
>>> > constructors, and this is how they are done in the reference
>>> > implementation. I could've specified this in the tech spec but
>>> there
>>> > was
>>> > no need, I considered it better to allow implementors freedom of
>>> > implementation. There is no real-world waste.
>>> >
>>> >
>>> > 5. UB - the block_capacity_hard_limits() function was specifically
>>> > introduced so that the user can avoid this issue, hence it is a
>>> > non-issue. Arthur is aware of this.
>>> >
>>> >
>>> > I move to dismiss this paper with prejudice. It is a large waste of
>>> > committee time, and far from displaying an above-average
>>> understanding
>>> > of the container in question, shows a low understanding. Enough
>>> time
>>> > has
>>> > already been wasted. Let's move on. The larger concern is that this
>>> > paper should serve the community, and in terms of who I've been in
>>> > contact with, and who has contacted me, they have overwhelmingly
>>> > required the current behaviour. I find it disturbing that Arthur
>>> has
>>> > ignored both responses to his enquiries on this subject in the
>>> > reflector
>>> > and in the hive paper itself, and has abused the system to draw
>>> more
>>> > attention without giving due responses. At any rate, I do not
>>> intend to
>>> > personally waste significantly more time on this.
>>> > Cheers,
>>> > Matt
>>> >
>>>
>> _______________________________________________
>> Lib-Ext mailing list
>> Lib-Ext_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext
>> Link to this post: http://lists.isocpp.org/lib-ext/2022/06/23519.php
>>
>
Received on 2022-06-23 18:33:06