Hi Michael,

It sounds like Bryce doesn't want July 13th to be the "official" LEWG discussion of P0447+P2596. Still, FWIW, I should be able to attend the July 13th SG14 meeting (at either 11am or 8pm NYC time), and I certainly won't mind if SG14 has nothing better to do than talk about P0447 and/or P2596. The more people talk about `hive` the better, IMHO.

As always, if anyone in SG14 or elsewhere has implementation experience with a recent version of P0447, or has used any equivalent of `hive_limits`/`reshape` in the wild, I'd be very interested in their report.

–Arthur

On Wed, Jun 22, 2022 at 11:25 PM Michael Wong <fraggamuffin@gmail.com> wrote:

Hi Matthew and Arthur, without stepping on LEWG's domain (and LEWG please let me know to back off if I am), I think we all want this proposal, and none more than SG14 but it must fit LEWG/LWG conventions. I would like to offer my help to bring this to an agreement which I am convinced is still possible. I don't pretend to understand all the issues at play here because there may be misunderstandings, cross communications, missed notes, and unassumed intentions. Perhaps we can start with a fresh slate and believe everyone is coming at this with an honest intention and forget all that and aim for a common technical ground.
If you guys like someone to help mediate, I am willing to offer myself if you both agree to it. I also want one of the LEWG chairs, perhaps Ben Craig who started this thread (or others) to help preside as well as the two agreed proxies for this paper: Guy Davidson and Patrice Roy (if I recall correctly from SG14 and they also agree).

If everyone agrees and it is OK by Ben and LEWG, we can schedule this at the next scheduled SG14 call combined with LEWG on July 13.I know the normal SG14 time will be bad for NZ time. So we could try to host that SG14 call at 8-10 pm ET, which I believe is 12noon-2pm NZ. SG14 has done this before with an inverted meeting that is friendlier to Asia/Pacific. With enough advertisement, there will be enough folks attending combined from both SG14 and LEWG.

At this stage, this proposal is aiming for C++26 anyway, so I think 3 more weeks won't make a huge difference but it could increase its chance for smooth passing for C++26 which I think is all our aim here especially given this is one of the most requested features for Games. Of course, if you guys can work it out in these three weeks and progress in LEWG and don't need my service, I am also happy for that.

Please let me know if this proposal works for you?

On Wed, Jun 22, 2022 at 11:27 AM Arthur O'Dwyer via Lib-Ext <lib-ext@lists.isocpp.org> wrote:
On Tue, Jun 21, 2022 at 8:12 PM Matt Bentley <mattreecebentley@gmail.com> wrote:
Hi Arthur- is there a reason why you ignored all the feedback I gave you
prior to writing the paper?

Hmm; if you sent a reply to my email of 2022-05-27 (copied below), I never saw it. As I said on 2022-06-14, my impression was that I sent you a draft on 2022-05-27 and (as far as I knew) you never replied at all. If something went wrong and you actually sent feedback that got lost in the mail, my abject apologies.

I first mentioned my intention w.r.t. `reshape` on 2022-05-03 in thread titled
Re: [isocpp-lib-ext] Notes on P0447R20, was Re: Info+paper for hive discussion Tuesday 5th 11am pacific time
"... I'll write this up in a paper if I need to. ..."
That is, I was already well aware that my proposed API changes were not likely to be adopted as "friendly amendments" but would actually require a paper to actively discuss the pros and cons.
You did respond (on 2022-05-03) to that message with comments including
"...I think I'm about done with this. ... No it's not. ... No. ... You are technically correct. ... Don't spam me Arthur. ..."
I didn't perceive that as "feedback" per se, more "just the automatic gainsaying of anything the other person says," and there certainly wasn't anything in your response specifically engaging with my `reshape` stuff. So, my intention to propose (major API, LEWG-targeting, paper-requiring) changes to `reshape` was unaffected by those comments.

I also mentioned my intention, again repeating the concrete wording proposal, on 2022-05-16 in thread titled
[isocpp-lib-ext] P0447(std::hive): splice() and specified behaviour for reserved (unused) memory blocks, trim_capacity(n) overload
"...My proposal in this area (which I'll turn into a paper Real Soon Now)..."

Finally, I sent a draft of the paper to you personally on 2022-05-27 and then (having received no reply) submitted it to the mailing on 2022-06-10.

I appreciate that you probably see this paper as an encroachment on your personal domain of expertise, as a speedbump preventing `hive` from leaving LEWG at the eleventh hour, etc etc. But the question for LEWG is simply whether they want `hive` to leave LEWG in exactly its current state, or whether the criticisms in P2596 have enough merit to justify (further) modifying the API that is to be standardized for `std::hive`. If there are to be (further) API changes, LEWG is the right venue to discuss them.

I repeat: I think it would be good to hear what STL vendors and/or LWG luminaries think about the current API. If anyone other than Matt and myself has really dug into the implementation, I haven't heard about it. It is quite possible that this has happened without my hearing about it.

–Arthur

On 14/06/2022 4:58 am, Arthur O'Dwyer wrote:
> I'm sorry that Matt feels personally attacked here. I really just think
> [insert everything I say in P2596 Section 3.1 through 3.6
> <https://isocpp.org/files/papers/P2596R0.html#motivation>], and for
> those reasons I propose a simplification of std::hive's public API to
> bring it more into line with the rest of the STL and also give vendors
> the ability to make it more performant.
>
> FWIW, I did email a draft of P2596 to Matt Bentley via personal email on
> 2022-05-27, asking for his feedback; Matt never replied.
>
> I think it would be good to hear what STL vendors and/or LWG luminaries
> think about the current API. If anyone other than Matt and myself has
> really dug into the implementation, I haven't heard about it. It is
> /quite possible/ that this has happened without my hearing about it.
>
> –Arthur
>
>
> ---------- Forwarded message ---------
> From: *Arthur O'Dwyer* <arthur.j.odwyer@gmail.com
> <mailto:arthur.j.odwyer@gmail.com>>
> Date: Fri, May 27, 2022 at 9:17 AM
> Subject: Proposal for changes to std::hive::reshape
> To: Matthew Bentley <mattreecebentley@gmail.com
> <mailto:mattreecebentley@gmail.com>>
>
>
> Hi Matt,
>
> I've got a first draft of my proposed changes to `hive::reshape` now!
> I haven't distributed this to anyone else yet. Please let me know your
> thoughts.
> https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html
> <https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html>
>
> Thanks,
> Arthur
>
> On Mon, Jun 13, 2022 at 2:13 AM Matt Bentley <mattreecebentley@gmail.com
> <mailto:mattreecebentley@gmail.com>> wrote:
>
> As mentioned by Ben the immovable objects problem is unavoidable, and
> this is explained in the FAQ of the current hive paper under "Why are
> hive_limits specified in constructors and not relegated to a secondary
> function" (https://isocpp.org/files/papers/P0447R20.html
> <https://isocpp.org/files/papers/P0447R20.html>). Arthur's
> workaround is not suitable as a generalised solution to the problem.
>
> There is nothing true or of import in P2596. To go into detail would
> take too long, so I will address the main bullet points of each
> section.
> Assume anything I haven't addressed as false or too trivial to discuss.
>
>
> 1. The central premise is wrong - you do not get more performance for
> more locality beyond a certain point, due to cache limits and
> cache-line
> limits. After that point the extra performance comes from fewer
> allocations/deallocations for the larger chunk sizes, and the number of
> block transitions, not locality so much.
> If that weren't the case, libstdc++'s deque would have substantially
> slower iteration than vector for types that are smaller than it's chunk
> size. See benchmarks here:
> https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance
> <https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance>
>
> In terms of compilers and predicting cache/cache-line sizes, they
> cannot
> effectively do this for every platform, and certainly not for embedded,
> so the facility for implementors to specify their own chunk capacities
> is crucial. Larger block sizes also may increase memory fragmentation,
> though this is platform dependent.
>
>
> In terms of colony/hive itself, there are additional considerations.
> It was found that the performance for general use and referencer
> benchmarking (start here:
> https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification
> <https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification>) was
> unchanged when lowering default block capacity limits for hive to 8192,
> while vastly improving memory use. This is documented on the reflector
> in the discussion on the priority parameter. 8192 is now the default
> block size for non-scalar types in the reference implementation.
>
>
> This result is largely because of the following factor described in FAQ
> item 11.c:
> "If a block size is large and many erasures have occurred but the block
> is not completely empty of elements, in addition to having a lot of
> wasted unused memory, iterative performance will suffer due to the
> large
> memory gaps between any two non-erased elements and subsequent drops in
> data locality and cache performance. In this scenario the user would
> desire to experiment with benchmarking and limiting the minimum/maximum
> capacities of the blocks, such that memory blocks are freed earlier
> given the user's specific erasure patterns, and performance is
> improved."
>
> In other words, in real-world usage, having your block capacities too
> large results in less locality.
>
> I ran a few benchmarks over the weekend, and they confirmed this result.
> When testing single-insertion/erasure/iteration in isolation, moving to
> block capacities of 65535 resulted in better insertion performance but
> unchanged iteration/erasure, while changing the skipfield type to uint
> and max block capacity to uintmax resulted in better insertion but
> worse
> iteration performance. But in more real-world benchmarks (modification
> and referencer, where elements are inserted, erased and iterated upon
> over time on the same containers), moving to 65535 only improved
> performance by 1% while increasing memory usage by 8%, while moving to
> uint_max decreased performance by 1% while increasing memory use by
> 16%.
> This was on average across sets ranging from 10 to 1 million elements.
>
> In addition, the current implementation changes the skipfield type to
> uint_least8 for scalar types, based again on the results described in
> the priority parameter discussion on the reflector. This limits the
> block capacities for scalar types to 255. It is an
> implementation-specific decision, but will demonstrate the point
> further. So, I ran low-and-high real-world modification scenario
> benchmarking on type int, using uint_least8 on one run, uint_least16 on
> the second. On average in sets ranging from 10 to 1 million elements
> moving to uint_least16 improved performance by 2%, but used 44% more
> memory.
> (https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip
> <https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip>)
>
>
> Other reasons users need this:
> (a) When they know their erasure patterns are erasing X number of
> elements at once (so they can minimize jumps when iterating).
> (b) To avoid innappropriate amounts of allocation upon expansion (as
> mentioned by Ben), particularly on low-memory platforms like embedded.
> (c) To allow them to choose their own tradeoffs between memory use and
> performance.
>
>
>
> 2. "Max block sizes cause O(N) behaviour"
> Even if time complexity was always a major factor in performance (it is
> sometimes), this is false.
>
> For starters, Arthur is talking about O(N) behaviour with memory
> blocks,
> not elements, which is a very different story. The two examples
> provided
> where you would get improved time complexity from not limiting the
> block
> sizes, were only reflected in performance, statistically-speaking, if
> the max block capacities were very low (~255) and the number of blocks
> very high. Based on this, I had already worked around those two
> areas by
> the time I read the paper.
>
> In terms of the first example Arthur gives (block number re-writing),
> Arthur states block number re-numbering would happen 'less often'. No.
> It will happen 1 in every std::numeric_limits<size_t>::max()
> allocations
> or transfers from reserved blocks to active blocks ie. actually
> never in
> the lifetime of most programs. And it does not occur upon erasure.
> When it does happen, there is no guarantee that the number of blocks
> will be large. And it's not an expensive operation. Therefore the time
> complexity for those operations involving this op is O(1) amortized.
> This is clear in both the code and in the commit notes, so Arthur is
> aware of this given that he is aware that there is a change which makes
> the re-numbering less frequent. This is a deliberate falsehood.
>
> In addition, if a hive were implemented as a vector of pointers to
> groups, rather than a linked list of groups, this would also
> necessitate
> amortized time complexity in this scenario, as when the vector became
> full, and allocation of a new block needed to occur, all group pointers
> would need to be reallocated to a new memory block.
>
> In terms of the second example provided by Arthur, that issue no longer
> exists (the singly-linked list was changed to doubly-linked prior to
> receiving the paper, avoiding the problem). And it wasn't an expensive
> operation, except in the scenario described above.
>
>
>
> 3. splice() is a tertiary, and expected to be lesser-used function, but
> it's primary feature is transferring elements from one hive to another
> without invalidating iterators pointing to those elements, not it's
> time
> complexity. In terms of transference of elements, it is O(1). In terms
> of transference of element memory blocks, the hive paper states it
> as at
> worst, O(n) in the number of blocks in the container, and that is what
> it is. At best, it is O(1). You can check the code on this
> (https://github.com/mattreecebentley/plf_hive
> <https://github.com/mattreecebentley/plf_hive>). But at any rate, the
> operation is not an expensive one.
>
> It is a requirement that splice check whether the min/max capacities of
> the source hive are within the destination's (if so, then O(1)), if
> they
> aren't then every block needs it's capacity checked (O(N)).
> Arthur's "fix" doesn't change anything. You would still have to check
> every block's capacity if the minimum block capacity of the source were
> lower than the destination, time complexity is unchanged.
>
> Even if you were to somehow remove this potential O(N) behaviour, you
> would still have the same time complexity if the hive were implemented
> as a vector of pointers to blocks, rather than a linked list of blocks,
> as splice would need to copy each pointer from the source vector to the
> destination. So it would need to be specified the same in the tech spec.
>
>
>
> 4. Regards "bloating the interface", the additional constructor
> overloads that're necessary are all implementable as delegating
> constructors, and this is how they are done in the reference
> implementation. I could've specified this in the tech spec but there
> was
> no need, I considered it better to allow implementors freedom of
> implementation. There is no real-world waste.
>
>
> 5. UB - the block_capacity_hard_limits() function was specifically
> introduced so that the user can avoid this issue, hence it is a
> non-issue. Arthur is aware of this.
>
>
> I move to dismiss this paper with prejudice. It is a large waste of
> committee time, and far from displaying an above-average understanding
> of the container in question, shows a low understanding. Enough time
> has
> already been wasted. Let's move on. The larger concern is that this
> paper should serve the community, and in terms of who I've been in
> contact with, and who has contacted me, they have overwhelmingly
> required the current behaviour. I find it disturbing that Arthur has
> ignored both responses to his enquiries on this subject in the
> reflector
> and in the hive paper itself, and has abused the system to draw more
> attention without giving due responses. At any rate, I do not intend to
> personally waste significantly more time on this.
> Cheers,
> Matt
>

_______________________________________________
Lib-Ext mailing list
Lib-Ext@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext
Link to this post: http://lists.isocpp.org/lib-ext/2022/06/23519.php