sg14

From: Bryce Adelstein Lelbach aka wash <brycelelbach_at_[hidden]> · Date: Thu, 23 Jun 2022 09:30:10 -0400

--
Bryce Adelstein Lelbach aka wash (he/him/his)
US Programming Language Standards (PL22) Chair
ISO C++ Library Evolution Chair
HPC Programming Models Architect @ NVIDIA
--
On Wed, Jun 22, 2022, 23:25 Michael Wong via Lib-Ext <
lib-ext_at_[hidden]> wrote:
> Hi Matthew and Authur, without stepping on LEWG's domain (and LEWG please
> let me know to back off if I am), I think we all want this proposal, and
> none more than SG14 but it must fit LEWG/LWG conventions. I would like to
> offer my help to bring this to an agreement which I am convinced is still
> possible. I don't pretend to understand all the issues at play here because
> there may be misunderstandings, cross communications, missed notes, and
> unassumed intentions. Perhaps we can start with a fresh slate and believe
> everyone is coming at this with an honest intention and forget all that and
> aim for a common technical ground.
> If you guys like someone to help mediate, I am willing to offer myself if
> you both agree to it. I also want one of the LEWG chairs, perhaps Ben Craig
> who started this thread (or others) to help preside as well as the two
> agreed proxies for this paper: Guy Davidson and Patrice Roy (if I recall
> correctly from SG14 and they also agree).
>
> If everyone agrees and it is OK by Ben and LEWG, we can schedule this at
> the next scheduled SG14 call combined with LEWG on July 13.I know the
> normal SG14 time will be bad for NZ time. So we could try to host that SG14
> call  at 8-10 pm ET, which I believe is 12noon-2pm NZ. SG14 has done this
> before with an inverted meeting that is friendlier to Asia/Pacific. With
> enough advertisement, there will be enough folks attending combined from
> both SG14 and LEWG.
>
> At this stage, this proposal is aiming for C++26 anyway, so I think 3 more
> weeks won't make a huge difference but it could increase its chance for
> smooth passing for C++26 which I think is all our aim here especially given
> this is one of the most requested features for Games. Of course, if you
> guys can work it out in these three weeks and progress in LEWG and don't
> need my service, I am also happy for that.
>
> Please let me know if this proposal works for you?
>
> On Wed, Jun 22, 2022 at 11:27 AM Arthur O'Dwyer via Lib-Ext <
> lib-ext_at_[hidden]> wrote:
>
>> On Tue, Jun 21, 2022 at 8:12 PM Matt Bentley <mattreecebentley_at_[hidden]>
>> wrote:
>>
>>> Hi Arthur- is there a reason why you ignored all the feedback I gave you
>>> prior to writing the paper?
>>>
>>
>> Hmm; if you sent a reply to my email of 2022-05-27 (copied below), I
>> never saw it. As I said on 2022-06-14, my impression was that I sent you a
>> draft on 2022-05-27 and (as far as I knew) you never replied at all.  If
>> something went wrong and you actually sent feedback that got lost in the
>> mail, my abject apologies.
>>
>> I first mentioned my intention w.r.t. `reshape` on 2022-05-03 in thread
>> titled
>> Re: [isocpp-lib-ext] Notes on P0447R20, was Re: Info+paper for hive
>> discussion Tuesday 5th 11am pacific time
>> "... I'll write this up in a paper if I need to. ..."
>> That is, I was already well aware that my proposed API changes were not
>> likely to be adopted as "friendly amendments" but would actually require a
>> paper to actively discuss the pros and cons.
>> You did respond (on 2022-05-03) to that message with *comments* including
>> "...I think I'm about done with this. ... No it's not. ... No. ... You
>> are technically correct. ... Don't spam me Arthur. ..."
>> I didn't perceive that as "*feedback*" per se, more "just the automatic
>> gainsaying of anything the other person says," and there certainly wasn't
>> anything in your response specifically engaging with my `reshape` stuff.
>> So, my intention to propose (major API, LEWG-targeting, paper-requiring)
>> changes to `reshape` was unaffected by those comments.
>>
>> I also mentioned my intention, again repeating the concrete wording
>> proposal, on 2022-05-16 in thread titled
>> [isocpp-lib-ext] P0447(std::hive): splice() and specified behaviour for
>> reserved (unused) memory blocks, trim_capacity(n) overload
>> "...My proposal in this area (which I'll turn into a paper Real Soon
>> Now)..."
>>
>> Finally, I sent a draft of the paper to you personally on 2022-05-27 and
>> then (having received no reply) submitted it to the mailing on 2022-06-10.
>>
>> I appreciate that you probably see this paper as an encroachment on your
>> personal domain of expertise, as a speedbump preventing `hive` from leaving
>> LEWG at the eleventh hour, etc etc. But the question for LEWG is simply
>> whether they want `hive` to leave LEWG in exactly its current state, or
>> whether the criticisms in P2596 have enough merit to justify (further)
>> modifying the API that is to be standardized for `std::hive`. If there
>> *are* to be (further) API changes, LEWG is the right venue to discuss
>> them.
>>
>> I repeat: I think it would be good to hear what STL vendors and/or LWG
>> luminaries think about the current API. If anyone other than Matt and
>> myself has really dug into the implementation, I haven't heard about it.
>> It is *quite possible* that this has happened without my hearing about
>> it.
>>
>> –Arthur
>>
>>
>>
>> On 14/06/2022 4:58 am, Arthur O'Dwyer wrote:
>>> > I'm sorry that Matt feels personally attacked here. I really just
>>> think
>>> > [insert everything I say in P2596 Section 3.1 through 3.6
>>> > <https://isocpp.org/files/papers/P2596R0.html#motivation>], and for
>>> > those reasons I propose a simplification of std::hive's public API to
>>> > bring it more into line with the rest of the STL and also give vendors
>>> > the ability to make it more performant.
>>> >
>>> > FWIW, I did email a draft of P2596 to Matt Bentley via personal email
>>> on
>>> > 2022-05-27, asking for his feedback; Matt never replied.
>>> >
>>> > I think it would be good to hear what STL vendors and/or LWG
>>> luminaries
>>> > think about the current API. If anyone other than Matt and myself has
>>> > really dug into the implementation, I haven't heard about it.  It is
>>> > /quite possible/ that this has happened without my hearing about it.
>>> >
>>> > –Arthur
>>> >
>>> >
>>> > ---------- Forwarded message ---------
>>> > From: *Arthur O'Dwyer* <arthur.j.odwyer_at_[hidden]
>>> > <mailto:arthur.j.odwyer_at_[hidden]>>
>>> > Date: Fri, May 27, 2022 at 9:17 AM
>>> > Subject: Proposal for changes to std::hive::reshape
>>> > To: Matthew Bentley <mattreecebentley_at_[hidden]
>>> > <mailto:mattreecebentley_at_[hidden]>>
>>> >
>>> >
>>> > Hi Matt,
>>> >
>>> > I've got a first draft of my proposed changes to `hive::reshape` now!
>>> > I haven't distributed this to anyone else yet. Please let me know your
>>> > thoughts.
>>> > https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html
>>> > <https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html>
>>> >
>>> > Thanks,
>>> > Arthur
>>> >
>>> > On Mon, Jun 13, 2022 at 2:13 AM Matt Bentley <
>>> mattreecebentley_at_[hidden]
>>> > <mailto:mattreecebentley_at_[hidden]>> wrote:
>>> >
>>> >     As mentioned by Ben the immovable objects problem is unavoidable,
>>> and
>>> >     this is explained in the FAQ of the current hive paper under "Why
>>> are
>>> >     hive_limits specified in constructors and not relegated to a
>>> secondary
>>> >     function" (https://isocpp.org/files/papers/P0447R20.html
>>> >     <https://isocpp.org/files/papers/P0447R20.html>). Arthur's
>>> >     workaround is not suitable as a generalised solution to the
>>> problem.
>>> >
>>> >     There is nothing true or of import in P2596. To go into detail
>>> would
>>> >     take too long, so I will address the main bullet points of each
>>> >     section.
>>> >     Assume anything I haven't addressed as false or too trivial to
>>> discuss.
>>> >
>>> >
>>> >     1. The central premise is wrong - you do not get more performance
>>> for
>>> >     more locality beyond a certain point, due to cache limits and
>>> >     cache-line
>>> >     limits. After that point the extra performance comes from fewer
>>> >     allocations/deallocations for the larger chunk sizes, and the
>>> number of
>>> >     block transitions, not locality so much.
>>> >     If that weren't the case, libstdc++'s deque would have
>>> substantially
>>> >     slower iteration than vector for types that are smaller than it's
>>> chunk
>>> >     size. See benchmarks here:
>>> >     https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance
>>> >     <https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance>
>>> >
>>> >     In terms of compilers and predicting cache/cache-line sizes, they
>>> >     cannot
>>> >     effectively do this for every platform, and certainly not for
>>> embedded,
>>> >     so the facility for implementors to specify their own chunk
>>> capacities
>>> >     is crucial. Larger block sizes also may increase memory
>>> fragmentation,
>>> >     though this is platform dependent.
>>> >
>>> >
>>> >     In terms of colony/hive itself, there are additional
>>> considerations.
>>> >     It was found that the performance for general use and referencer
>>> >     benchmarking (start here:
>>> >     https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification
>>> >     <https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification>)
>>> was
>>> >     unchanged when lowering default block capacity limits for hive to
>>> 8192,
>>> >     while vastly improving memory use. This is documented on the
>>> reflector
>>> >     in the discussion on the priority parameter. 8192 is now the
>>> default
>>> >     block size for non-scalar types in the reference implementation.
>>> >
>>> >
>>> >     This result is largely because of the following factor described
>>> in FAQ
>>> >     item 11.c:
>>> >     "If a block size is large and many erasures have occurred but the
>>> block
>>> >     is not completely empty of elements, in addition to having a lot of
>>> >     wasted unused memory, iterative performance will suffer due to the
>>> >     large
>>> >     memory gaps between any two non-erased elements and subsequent
>>> drops in
>>> >     data locality and cache performance. In this scenario the user
>>> would
>>> >     desire to experiment with benchmarking and limiting the
>>> minimum/maximum
>>> >     capacities of the blocks, such that memory blocks are freed earlier
>>> >     given the user's specific erasure patterns, and performance is
>>> >     improved."
>>> >
>>> >     In other words, in real-world usage, having your block capacities
>>> too
>>> >     large results in less locality.
>>> >
>>> >     I ran a few benchmarks over the weekend, and they confirmed this
>>> result.
>>> >     When testing single-insertion/erasure/iteration in isolation,
>>> moving to
>>> >     block capacities of 65535 resulted in better insertion performance
>>> but
>>> >     unchanged iteration/erasure, while changing the skipfield type to
>>> uint
>>> >     and max block capacity to uintmax resulted in better insertion but
>>> >     worse
>>> >     iteration performance. But in more real-world benchmarks
>>> (modification
>>> >     and referencer, where elements are inserted, erased and iterated
>>> upon
>>> >     over time on the same containers), moving to 65535 only improved
>>> >     performance by 1% while increasing memory usage by 8%, while
>>> moving to
>>> >     uint_max decreased performance by 1% while increasing memory use by
>>> >     16%.
>>> >     This was on average across sets ranging from 10 to 1 million
>>> elements.
>>> >
>>> >     In addition, the current implementation changes the skipfield type
>>> to
>>> >     uint_least8 for scalar types, based again on the results described
>>> in
>>> >     the priority parameter discussion on the reflector. This limits the
>>> >     block capacities for scalar types to 255. It is an
>>> >     implementation-specific decision, but will demonstrate the point
>>> >     further. So, I ran low-and-high real-world modification scenario
>>> >     benchmarking on type int, using uint_least8 on one run,
>>> uint_least16 on
>>> >     the second. On average in sets ranging from 10 to 1 million
>>> elements
>>> >     moving to uint_least16 improved performance by 2%, but used 44%
>>> more
>>> >     memory.
>>> >     (
>>> https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip
>>> >     <
>>> https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip>)
>>> >
>>> >
>>> >     Other reasons users need this:
>>> >     (a) When they know their erasure patterns are erasing X number of
>>> >     elements at once (so they can minimize jumps when iterating).
>>> >     (b) To avoid innappropriate amounts of allocation upon expansion
>>> (as
>>> >     mentioned by Ben), particularly on low-memory platforms like
>>> embedded.
>>> >     (c) To allow them to choose their own tradeoffs between memory use
>>> and
>>> >     performance.
>>> >
>>> >
>>> >
>>> >     2. "Max block sizes cause O(N) behaviour"
>>> >     Even if time complexity was always a major factor in performance
>>> (it is
>>> >     sometimes), this is false.
>>> >
>>> >     For starters, Arthur is talking about O(N) behaviour with memory
>>> >     blocks,
>>> >     not elements, which is a very different story. The two examples
>>> >     provided
>>> >     where you would get improved time complexity from not limiting the
>>> >     block
>>> >     sizes, were only reflected in performance, statistically-speaking,
>>> if
>>> >     the max block capacities were very low (~255) and the number of
>>> blocks
>>> >     very high. Based on this, I had already worked around those two
>>> >     areas by
>>> >     the time I read the paper.
>>> >
>>> >     In terms of the first example Arthur gives (block number
>>> re-writing),
>>> >     Arthur states block number re-numbering would happen 'less often'.
>>> No.
>>> >     It will happen 1 in every std::numeric_limits<size_t>::max()
>>> >     allocations
>>> >     or transfers from reserved blocks to active blocks ie. actually
>>> >     never in
>>> >     the lifetime of most programs. And it does not occur upon erasure.
>>> >     When it does happen, there is no guarantee that the number of
>>> blocks
>>> >     will be large. And it's not an expensive operation. Therefore the
>>> time
>>> >     complexity for those operations involving this op is O(1)
>>> amortized.
>>> >     This is clear in both the code and in the commit notes, so Arthur
>>> is
>>> >     aware of this given that he is aware that there is a change which
>>> makes
>>> >     the re-numbering less frequent. This is a deliberate falsehood.
>>> >
>>> >     In addition, if a hive were implemented as a vector of pointers to
>>> >     groups, rather than a linked list of groups, this would also
>>> >     necessitate
>>> >     amortized time complexity in this scenario, as when the vector
>>> became
>>> >     full, and allocation of a new block needed to occur, all group
>>> pointers
>>> >     would need to be reallocated to a new memory block.
>>> >
>>> >     In terms of the second example provided by Arthur, that issue no
>>> longer
>>> >     exists (the singly-linked list was changed to doubly-linked prior
>>> to
>>> >     receiving the paper, avoiding the problem). And it wasn't an
>>> expensive
>>> >     operation, except in the scenario described above.
>>> >
>>> >
>>> >
>>> >     3. splice() is a tertiary, and expected to be lesser-used
>>> function, but
>>> >     it's primary feature is transferring elements from one hive to
>>> another
>>> >     without invalidating iterators pointing to those elements, not it's
>>> >     time
>>> >     complexity. In terms of transference of elements, it is O(1). In
>>> terms
>>> >     of transference of element memory blocks, the hive paper states it
>>> >     as at
>>> >     worst, O(n) in the number of blocks in the container, and that is
>>> what
>>> >     it is. At best, it is O(1). You can check the code on this
>>> >     (https://github.com/mattreecebentley/plf_hive
>>> >     <https://github.com/mattreecebentley/plf_hive>). But at any rate,
>>> the
>>> >     operation is not an expensive one.
>>> >
>>> >     It is a requirement that splice check whether the min/max
>>> capacities of
>>> >     the source hive are within the destination's (if so, then O(1)), if
>>> >     they
>>> >     aren't then every block needs it's capacity checked (O(N)).
>>> >     Arthur's "fix" doesn't change anything. You would still have to
>>> check
>>> >     every block's capacity if the minimum block capacity of the source
>>> were
>>> >     lower than the destination, time complexity is unchanged.
>>> >
>>> >     Even if you were to somehow remove this potential O(N) behaviour,
>>> you
>>> >     would still have the same time complexity if the hive were
>>> implemented
>>> >     as a vector of pointers to blocks, rather than a linked list of
>>> blocks,
>>> >     as splice would need to copy each pointer from the source vector
>>> to the
>>> >     destination. So it would need to be specified the same in the tech
>>> spec.
>>> >
>>> >
>>> >
>>> >     4. Regards "bloating the interface", the additional constructor
>>> >     overloads that're necessary are all implementable as delegating
>>> >     constructors, and this is how they are done in the reference
>>> >     implementation. I could've specified this in the tech spec but
>>> there
>>> >     was
>>> >     no need, I considered it better to allow implementors freedom of
>>> >     implementation. There is no real-world waste.
>>> >
>>> >
>>> >     5. UB - the block_capacity_hard_limits() function was specifically
>>> >     introduced so that the user can avoid this issue, hence it is a
>>> >     non-issue. Arthur is aware of this.
>>> >
>>> >
>>> >     I move to dismiss this paper with prejudice. It is a large waste of
>>> >     committee time, and far from displaying an above-average
>>> understanding
>>> >     of the container in question, shows a low understanding. Enough
>>> time
>>> >     has
>>> >     already been wasted. Let's move on. The larger concern is that this
>>> >     paper should serve the community, and in terms of who I've been in
>>> >     contact with, and who has contacted me, they have overwhelmingly
>>> >     required the current behaviour. I find it disturbing that Arthur
>>> has
>>> >     ignored both responses to his enquiries on this subject in the
>>> >     reflector
>>> >     and in the hive paper itself, and has abused the system to draw
>>> more
>>> >     attention without giving due responses. At any rate, I do not
>>> intend to
>>> >     personally waste significantly more time on this.
>>> >     Cheers,
>>> >     Matt
>>> >
>>>
>> _______________________________________________
>> Lib-Ext mailing list
>> Lib-Ext_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext
>> Link to this post: http://lists.isocpp.org/lib-ext/2022/06/23519.php
>>
> _______________________________________________
> Lib-Ext mailing list
> Lib-Ext_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext
> Link to this post: http://lists.isocpp.org/lib-ext/2022/06/23539.php
>