sg14: Re: [SG14] Fwd: New colony proposal

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 7 May 2020 08:27:50 +0200

On 07/05/2020 02.14, Matthew Bentley wrote:
>>
>
> > Torn here between obviating future improvements and being specific. It needs to stay in order for current implementations to be useful to, in particular, embedded folk, so it would be an unsigned integer type. If future implementations don't use it or use something different, it can be gracefully deprecated I guess.
>
> We have the usual cognitive issue of implementation vs. specification here.
> I have no objection that your implementation feels that a Skipfield template
> parameter is a good idea here. However, your implementation is not the
> specification. The specification is what goes into the standard text wording
> and which (I presume) you strive to provided in your "Technical
> Specification" section.
>
> The purpose of that wording is to allow independent implementations
> to exist, and to judge whether such independent implementations are
> conforming ("good") or not. It's not dissimilar to some legal text,
> which allows to judge whether some real-world action is objectionable
> or not. That means, the wording must be self-contained, unambiguous
> and complete.
>
> The end goal is to allow porting of user programs from one implementation
> to another. The user program needs to know which features it can rely
> on, by carefully inspecting the standard wording only.
>
> This e-mail exchange is solely about getting the "Technical Specification"
> as complete and unambiguous and generally high-quality as possible.
>
> Note that implementations (in general) are allowed to add additional
> (defaulted) template parameters, I believe, as a conforming extension.
>
> So, back to "Skipfield". If the "Skipfield" template parameter is mentioned
> in the specification anywhere, it establishes a requirement on independent
> implementations to provide such a Skipfield parameter. However, what
> should they do with that parameter? Just ignore it? If you don't tell
> implementations in the wording, they won't know.
>
> In the interest of clarity, let me state I will strongly oppose progressing
> this paper at any level (SG14, LEWG, LWG, WG21, CD, DIS) with such known
> glaring holes in the wording. (My expectation is, though, that simply
> pointing out the hole to LWG will stop it in its tracks right there.
> See https://cplusplus.github.io/LWG/lwg-active.html#3213 and
> https://cplusplus.github.io/LWG/lwg-active.html#3439 where we discovered
> similar holes a bit late.)
>
>
> As noted, part of my submitting this paper to SG14 is to try and find opinions and solutions to such ambiguities. For the next revision I'll try to include wording at the front of the tech spec which shows the explicitness of the multiple-memory block model, capacity limits + the relationship to the skipfield_type parameter. But in the interests of SG14 and it's members, including embedded, the whole thing is of less worth without the skipfield_type or capacity limiting capability - partially because capacity limiting allows developers to customise block sizes for a specific cache.

I don't follow your choice of the word "ambiguities".
I'm not seeing anything ambiguous here, just "not sufficiently specified".

Regarding the block size min/max stuff, I do believe there are effects you
can describe, such as guaranteeing that you can do "min" consecutive inserts
without an invocation of the allocator.

Oh, you also need to say that min <= max is required.

> > I haven't yet understood the "reverse iterator", --end(), empty container,
> > and undefined behavior train of thought. I believe you can only do --end()
> > on a non-empty container, as a matter of precondition for that operation.
> >
> >
> > I don't think the C++ spec specifies that iterators have to be bounded?
> > But certainly including bounding creates performance detriment and should be optional.
> > In a bounded iterator --end() would be non-problematic, in a non-bounded would result in undefined behaviour.
>
> What do you mean by "bounded"?
>
>
> Some implementations such as MSVC have optional bounded iterators which will refuse to go before begin() or after end().

That seems more a debugging aid than anything relevant to the specification.

> > I'm going to look into whether or not specifying min/max limits for block capacities without changing the capacities of existing blocks can have any adverse effects ie. if block 1 is capacity 4, and set_block_capacity_limits sets min capacity to 10, whether leaving block 1 at capacity 4 could have any adverse effects.
> > If not, will re-implement so that it's a simple "set variables" operation without any side effects.
> > If so, will re-implement so that set_block_capacity_limits doesn't include the reallocation code when the type is non-copyable/movable (via constexpr and type_traits).
>
> That seems a step in the wrong direction. If you want to offer an
> interface that does not reallocate, but just sets the new min/max
> parameters (and thus has different requirements on the contained T),
> that's fine.
>
> However, that's a different operation (and thus should get a different name)
> compared to an operation that does perform the reallocation and thus
> has a few more requirements on T.
>
>
> The way I've got it implemented in a test implementation, is that it's the same interface, but if the type is non-copyable/movable, and the container isn't empty, it returns an error code. It would essentially move the responsibility from the interface to the programmer to clear() the container first if they're (a) using a non-copyable/movable type (b) they want to change the capacities but (c) have already inserted elements. It would also return an error code in this scenario if the type was copyable/movable but not empty and it failed in it's allocation of the memory for the new memory blocks to reallocate to. Alternatively it could throw instead of returning an error code, I have no strong feelings either way on that one. Your feeling?

"throw" or "error code" is way too much decision-making at runtime for
a function that fundamentally changes its requirements on the T.

Summarizing my view on the design here:

- make a pre-existing colony empty with new min/max parameters:
       my_colony = std::colony(min, max); // needs syntactic sugar to disambiguate
- set new parameters with reallocation (requires T to be movable)
       my_colony.reshape(min, max); // some name that expresses "lots of stuff going on"
- set new parameters to new allocations going forward (can be omitted if not desired)
       my_colony.block_size(min, max);

> I think it's better to do this than simply set the values and assume that these are for future block capacities only, as there is no way for the programmer to know what blocks currently use which capacities, and hence no way for them to make a decision on that basis to reallocate from existing blocks.

Great, so the third option above wouldn't exist. Fine.

> > I tried implementing it your way, but the code-smell got bad pretty quickly.
> > You end up with code like:
> > plf::colony<int, my_allocator_type>::minmax block_sizes;
> > block_sizes = i_colony.get_block_capacity_limits();
>
> auto block_sizes = i_colony.get_block_capacity_limits();
>
>
> I'm aware of that but my personal bugbear is to try and discourage the use of auto, it moves the required knowledge of what type you're using from the code to an external function definition, which isn't great for readability in my view.

I agree with your fundamental dislike of "auto". Instead, use a typedef for
std::colony<int, my_allocator_type> and the visual clutter is reasonably
reduced.

> > You could just go with size_t I guess.
>
> Would it be a problem if the outward-facing interface would always use size_t here?
> Block sizes > 2^32 should be rare anyway, right?
>
>
> I think so. From my measurements, there's no performance advantage to be had in almost any scenario from having block sizes larger than 65535, but I can't predict a potentially better future implementation, so *shrug*.

It seems the most consistent approach is to go with a nested class
that implicitly depends on the outer template parameter.

Again, block size manipulations should be rare, so any visual
impairment of the surrounding code is a fairly minor concern,
it seems.

> I'm not sure I like the "runtime failure if overflow" approach here, though.
> As opposed to forwarding the Allocator's size_type in the interface.
>
>
> You could, I suppose, template the struct and supply the allocator direct to it, but then, there's no 100% guarantees.

Well, you could also supply the size_type as a template parameter, but that
allows a level of freedom nobody really wants.

> > It doesn't - it does a basic merge-sort in all implementations I've seen. I pretty much came up with the pointer-sort dereferencing hack (https://plflib.org/list.htm#details) - or at least I haven't seen it elsewhere - that's why plf::list sorting is 77% faster than std::list (see: https://plflib.org/benchmarks_haswell_gcc.htm#sort).
> > From my old reading of gcc/boost posts, I think the libstdC++ people are pretty against allocating memory during a sort.
>
> Hm.
>
> Does your "allocate a pointer array" approach in any way depend on the
> internals of std::colony?
>
> If not, maybe the right approach is to offer a stand-alone, self-contained
> algorithm "indirect_sort" or so that can sort any range of forward iterators
> and that expressly allocates memory, instead of providing a specialized
> member function just for this single container.
>
>
> So long as that allowed for specializations for a given container, that might work,

Would you need/want to specialize for std::colony? At the end, you move the
contained T's to their appropriate place within the existing std::colony
(or other container), right?

> but the main reason std::list has it's own internal sort function is because it only changes the next/previous pointers, no elements get reallocated
That's a strong reason to have a separate "sort" for std::list.

> - unlike colony and most other containers.
> You can of course do the same pointer technique with the next/previous pointers of std::list, which is more or less what I did with plf::list, but that requires a specialization.

An algorithm that needs to break the iterator abstraction for every
container to which it is applied isn't good.

> > I don't think there are abstract explanations for all the parameters which the other containers are able to influence either, at least not to the depth you're talking about. Prove me wrong?
>
> Proving a "for all" theorem over prose text is hard, of course.
> Let's turn this around:
>
> If you find an example of an unspecified parameter in the existing standard text,
> please tell me about it, because that needs to become an LWG issue urgently.
> (For example, std::deque also has a block size internally, but it is never
> exposed in the interface.)
>
>
> Okay, so the precondition for inclusion in the tech spec is that it is exposed in the interface. Good to know and fair enough.

Or have some other observable behavior. For example, std::future talks about
"shared state", but that's not directly exposed in the interface per se.

> > Not really. I think the C++ abstract machine is based on some fairly old understandings which don't hold true anymore ie. that time complexity is strongly relevant in some capacity (latency, overall performance) which it isn't outside of large N or large sizeof(N). That being the case, it's not something I should overly concern myself with, as the computers it was designed for no longer exist.
>
> The O(N or whatever) complexities given in the standard library descriptions
> concern observable behavior of the algorithms, e.g. the number of invocations
> of the copy constructor or the number of invocations of a comparison.
>
> You can observe by simply counting the number of calls in your container
> element type T.
>
>
> My point is that the reasoning as to why the standard focuses on these things seems to be that they have an impact on performance in some way.

For some abstract concept of "performance".

> If they don't, it's hard to make a case for relevance. Why would you care about the number of calls, if the number of calls has no discernable impact on runtime behaviour? Or is there some other overarching reason to record about the number of calls?

Well, the number of calls to a client-supplied function is certainly
performance-relevant: If that function is expensive, going from O(n^2) to,
say, O(n^3), can certainly be noticeable in real life even for small-ish n.

And the standard library implementation, of course, can't say whether
an unknown future client-supplied function is expensive or not.

That said, part of the rationale of the current complexity specification
language is, of course, that this actually refers to measurable and
specifiable things in the framework of the abstract machine.

> From my point of view, if the relevance is performance-related then their relevance is greatly diminished compared to other factors. It seems to be a gap in vision to not see that the relevance of these things is as hardware-dependent as almost anything else. The greater the gap between memory and processor speeds becomes, the less relevance time complexity has on performance.

That might be true for your data structures, but not for my (possibly hostile)
ones. For example, T = std::array<int, 1000000> is expensive to move and swap
and possibly expensive to compare. For me, a complexity specification for
sorting a std::vector<T> is very relevant.

> If you believe the C++ abstract machine description is no longer fit for its purpose,
> you're very welcome to propose an update to EWG/CWG. Note that the entire
> state-of-the-art of compiler optimizations critically depends on the as-if
> rule formulated in [intro.abstract], so please tread carefully.
> (Allowing to specify something like P1315 secure_clear is in the same realm.)
>
>
> My point is not-so-much that it's not useful, just that the necessary gap between it and realworld machines means that not everything which is pertinent to allowing a given construct to exist, in a given way, is going to be expressible within the language of the abstract machine. If cache locality were to be part of the language of the abstract - that would also be a mistake. Because it's a realworld constraint which may change in 30 years. See my point above about time complexity. That *was* very relevant to performance, once.
> But that means I can't justify or express everything within the scope of that language.

You're conflating "justify" and "express". The former is the "why", which goes
into "Design Decisions" and can have as much hardware cache-locality talk as you
want. Only the latter is what goes into the "technical specification".

If you feel the cache-locality hint is needed for implementations to do the
right thing, then maybe you should add a (non-normative) "Note" near the top
of the std::colony subclause, explaining that fact.

As I said elsewhere, I do believe that the min/max block size has an impact
on observable behavior of std::colony, e.g. when allocations happen and for
some complexity statements. And those can certainly be described. Maybe the
relationship is intricate enough that a separate paragraph at the start of the
subclause is the more appropriate way of description, instead of stuffing
that into the individual function descriptions.

Jens

Received on 2020-05-07 01:30:54