C++ Logo

sg14

Advanced search

Re: [isocpp-lib-ext] hive reshape (P2596) and immovable objects

From: Bryce Adelstein Lelbach aka wash <brycelelbach_at_[hidden]>
Date: Thu, 23 Jun 2022 09:30:10 -0400
> If everyone agrees and it is OK by Ben and LEWG, we can schedule this at
the next scheduled SG14 call combined with LEWG on July 13

Thank you, Michael, but that will be unnecessary. Library Evolution will
look at this on our own. We will let you know when and if we need SG14's
assistance.

Some of the authors involved are not available for the next two months.
Library Evolution will not be looking at these papers until all the
stakeholders can be available.

--
Bryce Adelstein Lelbach aka wash (he/him/his)
US Programming Language Standards (PL22) Chair
ISO C++ Library Evolution Chair
HPC Programming Models Architect @ NVIDIA
--
On Wed, Jun 22, 2022, 23:25 Michael Wong via Lib-Ext <
lib-ext_at_[hidden]> wrote:
> Hi Matthew and Authur, without stepping on LEWG's domain (and LEWG please
> let me know to back off if I am), I think we all want this proposal, and
> none more than SG14 but it must fit LEWG/LWG conventions. I would like to
> offer my help to bring this to an agreement which I am convinced is still
> possible. I don't pretend to understand all the issues at play here because
> there may be misunderstandings, cross communications, missed notes, and
> unassumed intentions. Perhaps we can start with a fresh slate and believe
> everyone is coming at this with an honest intention and forget all that and
> aim for a common technical ground.
> If you guys like someone to help mediate, I am willing to offer myself if
> you both agree to it. I also want one of the LEWG chairs, perhaps Ben Craig
> who started this thread (or others) to help preside as well as the two
> agreed proxies for this paper: Guy Davidson and Patrice Roy (if I recall
> correctly from SG14 and they also agree).
>
> If everyone agrees and it is OK by Ben and LEWG, we can schedule this at
> the next scheduled SG14 call combined with LEWG on July 13.I know the
> normal SG14 time will be bad for NZ time. So we could try to host that SG14
> call  at 8-10 pm ET, which I believe is 12noon-2pm NZ. SG14 has done this
> before with an inverted meeting that is friendlier to Asia/Pacific. With
> enough advertisement, there will be enough folks attending combined from
> both SG14 and LEWG.
>
> At this stage, this proposal is aiming for C++26 anyway, so I think 3 more
> weeks won't make a huge difference but it could increase its chance for
> smooth passing for C++26 which I think is all our aim here especially given
> this is one of the most requested features for Games. Of course, if you
> guys can work it out in these three weeks and progress in LEWG and don't
> need my service, I am also happy for that.
>
> Please let me know if this proposal works for you?
>
> On Wed, Jun 22, 2022 at 11:27 AM Arthur O'Dwyer via Lib-Ext <
> lib-ext_at_[hidden]> wrote:
>
>> On Tue, Jun 21, 2022 at 8:12 PM Matt Bentley <mattreecebentley_at_[hidden]>
>> wrote:
>>
>>> Hi Arthur- is there a reason why you ignored all the feedback I gave you
>>> prior to writing the paper?
>>>
>>
>> Hmm; if you sent a reply to my email of 2022-05-27 (copied below), I
>> never saw it. As I said on 2022-06-14, my impression was that I sent you a
>> draft on 2022-05-27 and (as far as I knew) you never replied at all.  If
>> something went wrong and you actually sent feedback that got lost in the
>> mail, my abject apologies.
>>
>> I first mentioned my intention w.r.t. `reshape` on 2022-05-03 in thread
>> titled
>> Re: [isocpp-lib-ext] Notes on P0447R20, was Re: Info+paper for hive
>> discussion Tuesday 5th 11am pacific time
>> "... I'll write this up in a paper if I need to. ..."
>> That is, I was already well aware that my proposed API changes were not
>> likely to be adopted as "friendly amendments" but would actually require a
>> paper to actively discuss the pros and cons.
>> You did respond (on 2022-05-03) to that message with *comments* including
>> "...I think I'm about done with this. ... No it's not. ... No. ... You
>> are technically correct. ... Don't spam me Arthur. ..."
>> I didn't perceive that as "*feedback*" per se, more "just the automatic
>> gainsaying of anything the other person says," and there certainly wasn't
>> anything in your response specifically engaging with my `reshape` stuff.
>> So, my intention to propose (major API, LEWG-targeting, paper-requiring)
>> changes to `reshape` was unaffected by those comments.
>>
>> I also mentioned my intention, again repeating the concrete wording
>> proposal, on 2022-05-16 in thread titled
>> [isocpp-lib-ext] P0447(std::hive): splice() and specified behaviour for
>> reserved (unused) memory blocks, trim_capacity(n) overload
>> "...My proposal in this area (which I'll turn into a paper Real Soon
>> Now)..."
>>
>> Finally, I sent a draft of the paper to you personally on 2022-05-27 and
>> then (having received no reply) submitted it to the mailing on 2022-06-10.
>>
>> I appreciate that you probably see this paper as an encroachment on your
>> personal domain of expertise, as a speedbump preventing `hive` from leaving
>> LEWG at the eleventh hour, etc etc. But the question for LEWG is simply
>> whether they want `hive` to leave LEWG in exactly its current state, or
>> whether the criticisms in P2596 have enough merit to justify (further)
>> modifying the API that is to be standardized for `std::hive`. If there
>> *are* to be (further) API changes, LEWG is the right venue to discuss
>> them.
>>
>> I repeat: I think it would be good to hear what STL vendors and/or LWG
>> luminaries think about the current API. If anyone other than Matt and
>> myself has really dug into the implementation, I haven't heard about it.
>> It is *quite possible* that this has happened without my hearing about
>> it.
>>
>> –Arthur
>>
>>
>>
>> On 14/06/2022 4:58 am, Arthur O'Dwyer wrote:
>>> > I'm sorry that Matt feels personally attacked here. I really just
>>> think
>>> > [insert everything I say in P2596 Section 3.1 through 3.6
>>> > <https://isocpp.org/files/papers/P2596R0.html#motivation>], and for
>>> > those reasons I propose a simplification of std::hive's public API to
>>> > bring it more into line with the rest of the STL and also give vendors
>>> > the ability to make it more performant.
>>> >
>>> > FWIW, I did email a draft of P2596 to Matt Bentley via personal email
>>> on
>>> > 2022-05-27, asking for his feedback; Matt never replied.
>>> >
>>> > I think it would be good to hear what STL vendors and/or LWG
>>> luminaries
>>> > think about the current API. If anyone other than Matt and myself has
>>> > really dug into the implementation, I haven't heard about it.  It is
>>> > /quite possible/ that this has happened without my hearing about it.
>>> >
>>> > –Arthur
>>> >
>>> >
>>> > ---------- Forwarded message ---------
>>> > From: *Arthur O'Dwyer* <arthur.j.odwyer_at_[hidden]
>>> > <mailto:arthur.j.odwyer_at_[hidden]>>
>>> > Date: Fri, May 27, 2022 at 9:17 AM
>>> > Subject: Proposal for changes to std::hive::reshape
>>> > To: Matthew Bentley <mattreecebentley_at_[hidden]
>>> > <mailto:mattreecebentley_at_[hidden]>>
>>> >
>>> >
>>> > Hi Matt,
>>> >
>>> > I've got a first draft of my proposed changes to `hive::reshape` now!
>>> > I haven't distributed this to anyone else yet. Please let me know your
>>> > thoughts.
>>> > https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html
>>> > <https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html>
>>> >
>>> > Thanks,
>>> > Arthur
>>> >
>>> > On Mon, Jun 13, 2022 at 2:13 AM Matt Bentley <
>>> mattreecebentley_at_[hidden]
>>> > <mailto:mattreecebentley_at_[hidden]>> wrote:
>>> >
>>> >     As mentioned by Ben the immovable objects problem is unavoidable,
>>> and
>>> >     this is explained in the FAQ of the current hive paper under "Why
>>> are
>>> >     hive_limits specified in constructors and not relegated to a
>>> secondary
>>> >     function" (https://isocpp.org/files/papers/P0447R20.html
>>> >     <https://isocpp.org/files/papers/P0447R20.html>). Arthur's
>>> >     workaround is not suitable as a generalised solution to the
>>> problem.
>>> >
>>> >     There is nothing true or of import in P2596. To go into detail
>>> would
>>> >     take too long, so I will address the main bullet points of each
>>> >     section.
>>> >     Assume anything I haven't addressed as false or too trivial to
>>> discuss.
>>> >
>>> >
>>> >     1. The central premise is wrong - you do not get more performance
>>> for
>>> >     more locality beyond a certain point, due to cache limits and
>>> >     cache-line
>>> >     limits. After that point the extra performance comes from fewer
>>> >     allocations/deallocations for the larger chunk sizes, and the
>>> number of
>>> >     block transitions, not locality so much.
>>> >     If that weren't the case, libstdc++'s deque would have
>>> substantially
>>> >     slower iteration than vector for types that are smaller than it's
>>> chunk
>>> >     size. See benchmarks here:
>>> >     https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance
>>> >     <https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance>
>>> >
>>> >     In terms of compilers and predicting cache/cache-line sizes, they
>>> >     cannot
>>> >     effectively do this for every platform, and certainly not for
>>> embedded,
>>> >     so the facility for implementors to specify their own chunk
>>> capacities
>>> >     is crucial. Larger block sizes also may increase memory
>>> fragmentation,
>>> >     though this is platform dependent.
>>> >
>>> >
>>> >     In terms of colony/hive itself, there are additional
>>> considerations.
>>> >     It was found that the performance for general use and referencer
>>> >     benchmarking (start here:
>>> >     https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification
>>> >     <https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification>)
>>> was
>>> >     unchanged when lowering default block capacity limits for hive to
>>> 8192,
>>> >     while vastly improving memory use. This is documented on the
>>> reflector
>>> >     in the discussion on the priority parameter. 8192 is now the
>>> default
>>> >     block size for non-scalar types in the reference implementation.
>>> >
>>> >
>>> >     This result is largely because of the following factor described
>>> in FAQ
>>> >     item 11.c:
>>> >     "If a block size is large and many erasures have occurred but the
>>> block
>>> >     is not completely empty of elements, in addition to having a lot of
>>> >     wasted unused memory, iterative performance will suffer due to the
>>> >     large
>>> >     memory gaps between any two non-erased elements and subsequent
>>> drops in
>>> >     data locality and cache performance. In this scenario the user
>>> would
>>> >     desire to experiment with benchmarking and limiting the
>>> minimum/maximum
>>> >     capacities of the blocks, such that memory blocks are freed earlier
>>> >     given the user's specific erasure patterns, and performance is
>>> >     improved."
>>> >
>>> >     In other words, in real-world usage, having your block capacities
>>> too
>>> >     large results in less locality.
>>> >
>>> >     I ran a few benchmarks over the weekend, and they confirmed this
>>> result.
>>> >     When testing single-insertion/erasure/iteration in isolation,
>>> moving to
>>> >     block capacities of 65535 resulted in better insertion performance
>>> but
>>> >     unchanged iteration/erasure, while changing the skipfield type to
>>> uint
>>> >     and max block capacity to uintmax resulted in better insertion but
>>> >     worse
>>> >     iteration performance. But in more real-world benchmarks
>>> (modification
>>> >     and referencer, where elements are inserted, erased and iterated
>>> upon
>>> >     over time on the same containers), moving to 65535 only improved
>>> >     performance by 1% while increasing memory usage by 8%, while
>>> moving to
>>> >     uint_max decreased performance by 1% while increasing memory use by
>>> >     16%.
>>> >     This was on average across sets ranging from 10 to 1 million
>>> elements.
>>> >
>>> >     In addition, the current implementation changes the skipfield type
>>> to
>>> >     uint_least8 for scalar types, based again on the results described
>>> in
>>> >     the priority parameter discussion on the reflector. This limits the
>>> >     block capacities for scalar types to 255. It is an
>>> >     implementation-specific decision, but will demonstrate the point
>>> >     further. So, I ran low-and-high real-world modification scenario
>>> >     benchmarking on type int, using uint_least8 on one run,
>>> uint_least16 on
>>> >     the second. On average in sets ranging from 10 to 1 million
>>> elements
>>> >     moving to uint_least16 improved performance by 2%, but used 44%
>>> more
>>> >     memory.
>>> >     (
>>> https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip
>>> >     <
>>> https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip>)
>>> >
>>> >
>>> >     Other reasons users need this:
>>> >     (a) When they know their erasure patterns are erasing X number of
>>> >     elements at once (so they can minimize jumps when iterating).
>>> >     (b) To avoid innappropriate amounts of allocation upon expansion
>>> (as
>>> >     mentioned by Ben), particularly on low-memory platforms like
>>> embedded.
>>> >     (c) To allow them to choose their own tradeoffs between memory use
>>> and
>>> >     performance.
>>> >
>>> >
>>> >
>>> >     2. "Max block sizes cause O(N) behaviour"
>>> >     Even if time complexity was always a major factor in performance
>>> (it is
>>> >     sometimes), this is false.
>>> >
>>> >     For starters, Arthur is talking about O(N) behaviour with memory
>>> >     blocks,
>>> >     not elements, which is a very different story. The two examples
>>> >     provided
>>> >     where you would get improved time complexity from not limiting the
>>> >     block
>>> >     sizes, were only reflected in performance, statistically-speaking,
>>> if
>>> >     the max block capacities were very low (~255) and the number of
>>> blocks
>>> >     very high. Based on this, I had already worked around those two
>>> >     areas by
>>> >     the time I read the paper.
>>> >
>>> >     In terms of the first example Arthur gives (block number
>>> re-writing),
>>> >     Arthur states block number re-numbering would happen 'less often'.
>>> No.
>>> >     It will happen 1 in every std::numeric_limits<size_t>::max()
>>> >     allocations
>>> >     or transfers from reserved blocks to active blocks ie. actually
>>> >     never in
>>> >     the lifetime of most programs. And it does not occur upon erasure.
>>> >     When it does happen, there is no guarantee that the number of
>>> blocks
>>> >     will be large. And it's not an expensive operation. Therefore the
>>> time
>>> >     complexity for those operations involving this op is O(1)
>>> amortized.
>>> >     This is clear in both the code and in the commit notes, so Arthur
>>> is
>>> >     aware of this given that he is aware that there is a change which
>>> makes
>>> >     the re-numbering less frequent. This is a deliberate falsehood.
>>> >
>>> >     In addition, if a hive were implemented as a vector of pointers to
>>> >     groups, rather than a linked list of groups, this would also
>>> >     necessitate
>>> >     amortized time complexity in this scenario, as when the vector
>>> became
>>> >     full, and allocation of a new block needed to occur, all group
>>> pointers
>>> >     would need to be reallocated to a new memory block.
>>> >
>>> >     In terms of the second example provided by Arthur, that issue no
>>> longer
>>> >     exists (the singly-linked list was changed to doubly-linked prior
>>> to
>>> >     receiving the paper, avoiding the problem). And it wasn't an
>>> expensive
>>> >     operation, except in the scenario described above.
>>> >
>>> >
>>> >
>>> >     3. splice() is a tertiary, and expected to be lesser-used
>>> function, but
>>> >     it's primary feature is transferring elements from one hive to
>>> another
>>> >     without invalidating iterators pointing to those elements, not it's
>>> >     time
>>> >     complexity. In terms of transference of elements, it is O(1). In
>>> terms
>>> >     of transference of element memory blocks, the hive paper states it
>>> >     as at
>>> >     worst, O(n) in the number of blocks in the container, and that is
>>> what
>>> >     it is. At best, it is O(1). You can check the code on this
>>> >     (https://github.com/mattreecebentley/plf_hive
>>> >     <https://github.com/mattreecebentley/plf_hive>). But at any rate,
>>> the
>>> >     operation is not an expensive one.
>>> >
>>> >     It is a requirement that splice check whether the min/max
>>> capacities of
>>> >     the source hive are within the destination's (if so, then O(1)), if
>>> >     they
>>> >     aren't then every block needs it's capacity checked (O(N)).
>>> >     Arthur's "fix" doesn't change anything. You would still have to
>>> check
>>> >     every block's capacity if the minimum block capacity of the source
>>> were
>>> >     lower than the destination, time complexity is unchanged.
>>> >
>>> >     Even if you were to somehow remove this potential O(N) behaviour,
>>> you
>>> >     would still have the same time complexity if the hive were
>>> implemented
>>> >     as a vector of pointers to blocks, rather than a linked list of
>>> blocks,
>>> >     as splice would need to copy each pointer from the source vector
>>> to the
>>> >     destination. So it would need to be specified the same in the tech
>>> spec.
>>> >
>>> >
>>> >
>>> >     4. Regards "bloating the interface", the additional constructor
>>> >     overloads that're necessary are all implementable as delegating
>>> >     constructors, and this is how they are done in the reference
>>> >     implementation. I could've specified this in the tech spec but
>>> there
>>> >     was
>>> >     no need, I considered it better to allow implementors freedom of
>>> >     implementation. There is no real-world waste.
>>> >
>>> >
>>> >     5. UB - the block_capacity_hard_limits() function was specifically
>>> >     introduced so that the user can avoid this issue, hence it is a
>>> >     non-issue. Arthur is aware of this.
>>> >
>>> >
>>> >     I move to dismiss this paper with prejudice. It is a large waste of
>>> >     committee time, and far from displaying an above-average
>>> understanding
>>> >     of the container in question, shows a low understanding. Enough
>>> time
>>> >     has
>>> >     already been wasted. Let's move on. The larger concern is that this
>>> >     paper should serve the community, and in terms of who I've been in
>>> >     contact with, and who has contacted me, they have overwhelmingly
>>> >     required the current behaviour. I find it disturbing that Arthur
>>> has
>>> >     ignored both responses to his enquiries on this subject in the
>>> >     reflector
>>> >     and in the hive paper itself, and has abused the system to draw
>>> more
>>> >     attention without giving due responses. At any rate, I do not
>>> intend to
>>> >     personally waste significantly more time on this.
>>> >     Cheers,
>>> >     Matt
>>> >
>>>
>> _______________________________________________
>> Lib-Ext mailing list
>> Lib-Ext_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext
>> Link to this post: http://lists.isocpp.org/lib-ext/2022/06/23519.php
>>
> _______________________________________________
> Lib-Ext mailing list
> Lib-Ext_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext
> Link to this post: http://lists.isocpp.org/lib-ext/2022/06/23539.php
>

Received on 2022-06-23 13:30:22