Hi Arthur- is there a reason why you ignored all the feedback I gave you
prior to writing the paper?
Hmm; if you sent a reply to my email of 2022-05-27 (copied below), I never saw it. As I said on 2022-06-14, my impression was that I sent you a draft on 2022-05-27 and (as far as I knew) you never replied at all. If something went wrong and you actually sent feedback that got lost in the mail, my abject apologies.
I first mentioned my intention w.r.t. `reshape` on 2022-05-03 in thread titled
Re: [isocpp-lib-ext] Notes on P0447R20, was Re: Info+paper for hive discussion Tuesday 5th 11am pacific time
"... I'll write this up in a paper if I need to. ..."
That is, I was already well aware that my proposed API changes were not likely to be adopted as "friendly amendments" but would actually require a paper to actively discuss the pros and cons.
You did respond (on 2022-05-03) to that message with comments including
"...I think I'm about done with this. ... No it's not. ... No. ... You are technically correct. ... Don't spam me Arthur. ..."
I didn't perceive that as "feedback" per se, more "just the automatic gainsaying of anything the other person says," and there certainly wasn't anything in your response specifically engaging with my `reshape` stuff. So, my intention to propose (major API, LEWG-targeting, paper-requiring) changes to `reshape` was unaffected by those comments.
I also mentioned my intention, again repeating the concrete wording proposal, on 2022-05-16 in thread titled
[isocpp-lib-ext] P0447(std::hive): splice() and specified behaviour for reserved (unused) memory blocks, trim_capacity(n) overload
"...My proposal in this area (which I'll turn into a paper Real Soon Now)..."
Finally, I sent a draft of the paper to you personally on 2022-05-27 and then (having received no reply) submitted it to the mailing on 2022-06-10.
I appreciate that you probably see this paper as an encroachment on your personal domain of expertise, as a speedbump preventing `hive` from leaving LEWG at the eleventh hour, etc etc. But the question for LEWG is simply whether they want `hive` to leave LEWG in exactly its current state, or whether the criticisms in P2596 have enough merit to justify (further) modifying the API that is to be standardized for `std::hive`. If there are to be (further) API changes, LEWG is the right venue to discuss them.
I repeat: I think it would be good to hear what STL vendors and/or LWG luminaries think about the current API. If anyone other than Matt and myself has really dug into the implementation, I haven't heard about it. It is quite possible that this has happened without my hearing about it.
–Arthur
On 14/06/2022 4:58 am, Arthur O'Dwyer wrote:
> I'm sorry that Matt feels personally attacked here. I really just think
> [insert everything I say in P2596 Section 3.1 through 3.6
> <https://isocpp.org/files/papers/P2596R0.html#motivation>], and for
> those reasons I propose a simplification of std::hive's public API to
> bring it more into line with the rest of the STL and also give vendors
> the ability to make it more performant.
>
> FWIW, I did email a draft of P2596 to Matt Bentley via personal email on
> 2022-05-27, asking for his feedback; Matt never replied.
>
> I think it would be good to hear what STL vendors and/or LWG luminaries
> think about the current API. If anyone other than Matt and myself has
> really dug into the implementation, I haven't heard about it. It is
> /quite possible/ that this has happened without my hearing about it.
>
> –Arthur
>
>
> ---------- Forwarded message ---------
> From: *Arthur O'Dwyer* <arthur.j.odwyer@gmail.com
> <mailto:arthur.j.odwyer@gmail.com>>
> Date: Fri, May 27, 2022 at 9:17 AM
> Subject: Proposal for changes to std::hive::reshape
> To: Matthew Bentley <mattreecebentley@gmail.com
> <mailto:mattreecebentley@gmail.com>>
>
>
> Hi Matt,
>
> I've got a first draft of my proposed changes to `hive::reshape` now!
> I haven't distributed this to anyone else yet. Please let me know your
> thoughts.
> https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html
> <https://quuxplusone.github.io/draft/d2596-improve-hive-reshape.html>
>
> Thanks,
> Arthur
>
> On Mon, Jun 13, 2022 at 2:13 AM Matt Bentley <mattreecebentley@gmail.com
> <mailto:mattreecebentley@gmail.com>> wrote:
>
> As mentioned by Ben the immovable objects problem is unavoidable, and
> this is explained in the FAQ of the current hive paper under "Why are
> hive_limits specified in constructors and not relegated to a secondary
> function" (https://isocpp.org/files/papers/P0447R20.html
> <https://isocpp.org/files/papers/P0447R20.html>). Arthur's
> workaround is not suitable as a generalised solution to the problem.
>
> There is nothing true or of import in P2596. To go into detail would
> take too long, so I will address the main bullet points of each
> section.
> Assume anything I haven't addressed as false or too trivial to discuss.
>
>
> 1. The central premise is wrong - you do not get more performance for
> more locality beyond a certain point, due to cache limits and
> cache-line
> limits. After that point the extra performance comes from fewer
> allocations/deallocations for the larger chunk sizes, and the number of
> block transitions, not locality so much.
> If that weren't the case, libstdc++'s deque would have substantially
> slower iteration than vector for types that are smaller than it's chunk
> size. See benchmarks here:
> https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance
> <https://plflib.org/benchmarks_haswell_gcc.htm#rawperformance>
>
> In terms of compilers and predicting cache/cache-line sizes, they
> cannot
> effectively do this for every platform, and certainly not for embedded,
> so the facility for implementors to specify their own chunk capacities
> is crucial. Larger block sizes also may increase memory fragmentation,
> though this is platform dependent.
>
>
> In terms of colony/hive itself, there are additional considerations.
> It was found that the performance for general use and referencer
> benchmarking (start here:
> https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification
> <https://plflib.org/benchmarks_haswell_gcc.htm#lowmodification>) was
> unchanged when lowering default block capacity limits for hive to 8192,
> while vastly improving memory use. This is documented on the reflector
> in the discussion on the priority parameter. 8192 is now the default
> block size for non-scalar types in the reference implementation.
>
>
> This result is largely because of the following factor described in FAQ
> item 11.c:
> "If a block size is large and many erasures have occurred but the block
> is not completely empty of elements, in addition to having a lot of
> wasted unused memory, iterative performance will suffer due to the
> large
> memory gaps between any two non-erased elements and subsequent drops in
> data locality and cache performance. In this scenario the user would
> desire to experiment with benchmarking and limiting the minimum/maximum
> capacities of the blocks, such that memory blocks are freed earlier
> given the user's specific erasure patterns, and performance is
> improved."
>
> In other words, in real-world usage, having your block capacities too
> large results in less locality.
>
> I ran a few benchmarks over the weekend, and they confirmed this result.
> When testing single-insertion/erasure/iteration in isolation, moving to
> block capacities of 65535 resulted in better insertion performance but
> unchanged iteration/erasure, while changing the skipfield type to uint
> and max block capacity to uintmax resulted in better insertion but
> worse
> iteration performance. But in more real-world benchmarks (modification
> and referencer, where elements are inserted, erased and iterated upon
> over time on the same containers), moving to 65535 only improved
> performance by 1% while increasing memory usage by 8%, while moving to
> uint_max decreased performance by 1% while increasing memory use by
> 16%.
> This was on average across sets ranging from 10 to 1 million elements.
>
> In addition, the current implementation changes the skipfield type to
> uint_least8 for scalar types, based again on the results described in
> the priority parameter discussion on the reflector. This limits the
> block capacities for scalar types to 255. It is an
> implementation-specific decision, but will demonstrate the point
> further. So, I ran low-and-high real-world modification scenario
> benchmarking on type int, using uint_least8 on one run, uint_least16 on
> the second. On average in sets ranging from 10 to 1 million elements
> moving to uint_least16 improved performance by 2%, but used 44% more
> memory.
> (https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip
> <https://plflib.org/results_for_hive_block_max_capacity_comparisons.zip>)
>
>
> Other reasons users need this:
> (a) When they know their erasure patterns are erasing X number of
> elements at once (so they can minimize jumps when iterating).
> (b) To avoid innappropriate amounts of allocation upon expansion (as
> mentioned by Ben), particularly on low-memory platforms like embedded.
> (c) To allow them to choose their own tradeoffs between memory use and
> performance.
>
>
>
> 2. "Max block sizes cause O(N) behaviour"
> Even if time complexity was always a major factor in performance (it is
> sometimes), this is false.
>
> For starters, Arthur is talking about O(N) behaviour with memory
> blocks,
> not elements, which is a very different story. The two examples
> provided
> where you would get improved time complexity from not limiting the
> block
> sizes, were only reflected in performance, statistically-speaking, if
> the max block capacities were very low (~255) and the number of blocks
> very high. Based on this, I had already worked around those two
> areas by
> the time I read the paper.
>
> In terms of the first example Arthur gives (block number re-writing),
> Arthur states block number re-numbering would happen 'less often'. No.
> It will happen 1 in every std::numeric_limits<size_t>::max()
> allocations
> or transfers from reserved blocks to active blocks ie. actually
> never in
> the lifetime of most programs. And it does not occur upon erasure.
> When it does happen, there is no guarantee that the number of blocks
> will be large. And it's not an expensive operation. Therefore the time
> complexity for those operations involving this op is O(1) amortized.
> This is clear in both the code and in the commit notes, so Arthur is
> aware of this given that he is aware that there is a change which makes
> the re-numbering less frequent. This is a deliberate falsehood.
>
> In addition, if a hive were implemented as a vector of pointers to
> groups, rather than a linked list of groups, this would also
> necessitate
> amortized time complexity in this scenario, as when the vector became
> full, and allocation of a new block needed to occur, all group pointers
> would need to be reallocated to a new memory block.
>
> In terms of the second example provided by Arthur, that issue no longer
> exists (the singly-linked list was changed to doubly-linked prior to
> receiving the paper, avoiding the problem). And it wasn't an expensive
> operation, except in the scenario described above.
>
>
>
> 3. splice() is a tertiary, and expected to be lesser-used function, but
> it's primary feature is transferring elements from one hive to another
> without invalidating iterators pointing to those elements, not it's
> time
> complexity. In terms of transference of elements, it is O(1). In terms
> of transference of element memory blocks, the hive paper states it
> as at
> worst, O(n) in the number of blocks in the container, and that is what
> it is. At best, it is O(1). You can check the code on this
> (https://github.com/mattreecebentley/plf_hive
> <https://github.com/mattreecebentley/plf_hive>). But at any rate, the
> operation is not an expensive one.
>
> It is a requirement that splice check whether the min/max capacities of
> the source hive are within the destination's (if so, then O(1)), if
> they
> aren't then every block needs it's capacity checked (O(N)).
> Arthur's "fix" doesn't change anything. You would still have to check
> every block's capacity if the minimum block capacity of the source were
> lower than the destination, time complexity is unchanged.
>
> Even if you were to somehow remove this potential O(N) behaviour, you
> would still have the same time complexity if the hive were implemented
> as a vector of pointers to blocks, rather than a linked list of blocks,
> as splice would need to copy each pointer from the source vector to the
> destination. So it would need to be specified the same in the tech spec.
>
>
>
> 4. Regards "bloating the interface", the additional constructor
> overloads that're necessary are all implementable as delegating
> constructors, and this is how they are done in the reference
> implementation. I could've specified this in the tech spec but there
> was
> no need, I considered it better to allow implementors freedom of
> implementation. There is no real-world waste.
>
>
> 5. UB - the block_capacity_hard_limits() function was specifically
> introduced so that the user can avoid this issue, hence it is a
> non-issue. Arthur is aware of this.
>
>
> I move to dismiss this paper with prejudice. It is a large waste of
> committee time, and far from displaying an above-average understanding
> of the container in question, shows a low understanding. Enough time
> has
> already been wasted. Let's move on. The larger concern is that this
> paper should serve the community, and in terms of who I've been in
> contact with, and who has contacted me, they have overwhelmingly
> required the current behaviour. I find it disturbing that Arthur has
> ignored both responses to his enquiries on this subject in the
> reflector
> and in the hive paper itself, and has abused the system to draw more
> attention without giving due responses. At any rate, I do not intend to
> personally waste significantly more time on this.
> Cheers,
> Matt
>