Date: Thu, 7 May 2020 12:14:20 +1200
>
> > Torn here between obviating future improvements and being specific. It
> needs to stay in order for current implementations to be useful to, in
> particular, embedded folk, so it would be an unsigned integer type. If
> future implementations don't use it or use something different, it can be
> gracefully deprecated I guess.
>
> We have the usual cognitive issue of implementation vs. specification here.
> I have no objection that your implementation feels that a Skipfield
> template
> parameter is a good idea here. However, your implementation is not the
> specification. The specification is what goes into the standard text
> wording
> and which (I presume) you strive to provided in your "Technical
> Specification" section.
>
> The purpose of that wording is to allow independent implementations
> to exist, and to judge whether such independent implementations are
> conforming ("good") or not. It's not dissimilar to some legal text,
> which allows to judge whether some real-world action is objectionable
> or not. That means, the wording must be self-contained, unambiguous
> and complete.
>
> The end goal is to allow porting of user programs from one implementation
> to another. The user program needs to know which features it can rely
> on, by carefully inspecting the standard wording only.
>
> This e-mail exchange is solely about getting the "Technical Specification"
> as complete and unambiguous and generally high-quality as possible.
>
> Note that implementations (in general) are allowed to add additional
> (defaulted) template parameters, I believe, as a conforming extension.
>
> So, back to "Skipfield". If the "Skipfield" template parameter is
> mentioned
> in the specification anywhere, it establishes a requirement on independent
> implementations to provide such a Skipfield parameter. However, what
> should they do with that parameter? Just ignore it? If you don't tell
> implementations in the wording, they won't know.
>
> In the interest of clarity, let me state I will strongly oppose progressing
> this paper at any level (SG14, LEWG, LWG, WG21, CD, DIS) with such known
> glaring holes in the wording. (My expectation is, though, that simply
> pointing out the hole to LWG will stop it in its tracks right there.
> See https://cplusplus.github.io/LWG/lwg-active.html#3213 and
> https://cplusplus.github.io/LWG/lwg-active.html#3439 where we discovered
> similar holes a bit late.)
>
As noted, part of my submitting this paper to SG14 is to try and find
opinions and solutions to such ambiguities. For the next revision I'll try
to include wording at the front of the tech spec which shows the
explicitness of the multiple-memory block model, capacity limits + the
relationship to the skipfield_type parameter. But in the interests of SG14
and it's members, including embedded, the whole thing is of less worth
without the skipfield_type or capacity limiting capability - partially
because capacity limiting allows developers to customise block sizes for a
specific cache.
>
> > I haven't yet understood the "reverse iterator", --end(), empty
> container,
> > and undefined behavior train of thought. I believe you can only do
> --end()
> > on a non-empty container, as a matter of precondition for that
> operation.
> >
> >
> > I don't think the C++ spec specifies that iterators have to be bounded?
> > But certainly including bounding creates performance detriment and
> should be optional.
> > In a bounded iterator --end() would be non-problematic, in a non-bounded
> would result in undefined behaviour.
>
> What do you mean by "bounded"?
>
Some implementations such as MSVC have optional bounded iterators which
will refuse to go before begin() or after end().
> > I'm going to look into whether or not specifying min/max limits for
> block capacities without changing the capacities of existing blocks can
> have any adverse effects ie. if block 1 is capacity 4, and
> set_block_capacity_limits sets min capacity to 10, whether leaving block 1
> at capacity 4 could have any adverse effects.
> > If not, will re-implement so that it's a simple "set variables"
> operation without any side effects.
> > If so, will re-implement so that set_block_capacity_limits doesn't
> include the reallocation code when the type is non-copyable/movable (via
> constexpr and type_traits).
>
> That seems a step in the wrong direction. If you want to offer an
> interface that does not reallocate, but just sets the new min/max
> parameters (and thus has different requirements on the contained T),
> that's fine.
>
> However, that's a different operation (and thus should get a different
> name)
> compared to an operation that does perform the reallocation and thus
> has a few more requirements on T.
>
The way I've got it implemented in a test implementation, is that it's the
same interface, but if the type is non-copyable/movable, and the container
isn't empty, it returns an error code. It would essentially move the
responsibility from the interface to the programmer to clear() the
container first if they're (a) using a non-copyable/movable type (b) they
want to change the capacities but (c) have already inserted elements. It
would also return an error code in this scenario if the type was
copyable/movable but not empty and it failed in it's allocation of the
memory for the new memory blocks to reallocate to. Alternatively it could
throw instead of returning an error code, I have no strong feelings either
way on that one. Your feeling?
I think it's better to do this than simply set the values and assume that
these are for future block capacities only, as there is no way for the
programmer to know what blocks currently use which capacities, and hence no
way for them to make a decision on that basis to reallocate from existing
blocks.
> > I tried implementing it your way, but the code-smell got bad pretty
> quickly.
> > You end up with code like:
> > plf::colony<int, my_allocator_type>::minmax block_sizes;
> > block_sizes = i_colony.get_block_capacity_limits();
>
> auto block_sizes = i_colony.get_block_capacity_limits();
>
I'm aware of that but my personal bugbear is to try and discourage the use
of auto, it moves the required knowledge of what type you're using from the
code to an external function definition, which isn't great for readability
in my view.
> You could just go with size_t I guess.
>
> Would it be a problem if the outward-facing interface would always use
> size_t here?
> Block sizes > 2^32 should be rare anyway, right?
>
I think so. From my measurements, there's no performance advantage to be
had in almost any scenario from having block sizes larger than 65535, but I
can't predict a potentially better future implementation, so *shrug*.
I'm not sure I like the "runtime failure if overflow" approach here, though.
> As opposed to forwarding the Allocator's size_type in the interface.
>
You could, I suppose, template the struct and supply the allocator direct
to it, but then, there's no 100% guarantees.
> It doesn't - it does a basic merge-sort in all implementations I've seen.
> I pretty much came up with the pointer-sort dereferencing hack (
> https://plflib.org/list.htm#details) - or at least I haven't seen it
> elsewhere - that's why plf::list sorting is 77% faster than std::list (see:
> https://plflib.org/benchmarks_haswell_gcc.htm#sort).
> > From my old reading of gcc/boost posts, I think the libstdC++ people are
> pretty against allocating memory during a sort.
>
> Hm.
>
> Does your "allocate a pointer array" approach in any way depend on the
> internals of std::colony?
>
> If not, maybe the right approach is to offer a stand-alone, self-contained
> algorithm "indirect_sort" or so that can sort any range of forward
> iterators
> and that expressly allocates memory, instead of providing a specialized
> member function just for this single container.
>
So long as that allowed for specializations for a given container, that
might work,
but the main reason std::list has it's own internal sort function is
because it only changes the next/previous pointers, no elements get
reallocated - unlike colony and most other containers.
You can of course do the same pointer technique with the next/previous
pointers of std::list, which is more or less what I did with plf::list, but
that requires a specialization.
> I don't think there are abstract explanations for all the parameters
> which the other containers are able to influence either, at least not to
> the depth you're talking about. Prove me wrong?
>
> Proving a "for all" theorem over prose text is hard, of course.
> Let's turn this around:
>
> If you find an example of an unspecified parameter in the existing
> standard text,
> please tell me about it, because that needs to become an LWG issue
> urgently.
> (For example, std::deque also has a block size internally, but it is never
> exposed in the interface.)
>
Okay, so the precondition for inclusion in the tech spec is that it is
exposed in the interface. Good to know and fair enough.
> > Not really. I think the C++ abstract machine is based on some fairly old
> understandings which don't hold true anymore ie. that time complexity is
> strongly relevant in some capacity (latency, overall performance) which it
> isn't outside of large N or large sizeof(N). That being the case, it's not
> something I should overly concern myself with, as the computers it was
> designed for no longer exist.
>
> The O(N or whatever) complexities given in the standard library
> descriptions
> concern observable behavior of the algorithms, e.g. the number of
> invocations
> of the copy constructor or the number of invocations of a comparison.
>
> You can observe by simply counting the number of calls in your container
> element type T.
>
My point is that the reasoning as to why the standard focuses on these
things seems to be that they have an impact on performance in some way.
If they don't, it's hard to make a case for relevance. Why would you care
about the number of calls, if the number of calls has no discernable impact
on runtime behaviour? Or is there some other overarching reason to record
about the number of calls?
>From my point of view, if the relevance is performance-related then their
relevance is greatly diminished compared to other factors. It seems to be a
gap in vision to not see that the relevance of these things is as
hardware-dependent as almost anything else. The greater the gap between
memory and processor speeds becomes, the less relevance time complexity has
on performance.
> If you believe the C++ abstract machine description is no longer fit for
> its purpose,
> you're very welcome to propose an update to EWG/CWG. Note that the entire
> state-of-the-art of compiler optimizations critically depends on the as-if
> rule formulated in [intro.abstract], so please tread carefully.
> (Allowing to specify something like P1315 secure_clear is in the same
> realm.)
>
My point is not-so-much that it's not useful, just that the necessary gap
between it and realworld machines means that not everything which is
pertinent to allowing a given construct to exist, in a given way, is going
to be expressible within the language of the abstract machine. If cache
locality were to be part of the language of the abstract - that would also
be a mistake. Because it's a realworld constraint which may change in 30
years. See my point above about time complexity. That *was* very relevant
to performance, once.
But that means I can't justify or express everything within the scope of
that language.
All I'm saying is that the "technical specification" doesn't say that
> std::colony's iterators have relational operators, so independent
> implementations won't provide them, so portable user programs can't use
> them. If that's not your desired end-state, the "technical specification"
> needs to say something about relational operators, and probably specify
> their semantics (because you can't use the RandomAccessIterator
> specification framework, obviously).
>
Okay cool, will do.
Thanks again,
Matt
> > Torn here between obviating future improvements and being specific. It
> needs to stay in order for current implementations to be useful to, in
> particular, embedded folk, so it would be an unsigned integer type. If
> future implementations don't use it or use something different, it can be
> gracefully deprecated I guess.
>
> We have the usual cognitive issue of implementation vs. specification here.
> I have no objection that your implementation feels that a Skipfield
> template
> parameter is a good idea here. However, your implementation is not the
> specification. The specification is what goes into the standard text
> wording
> and which (I presume) you strive to provided in your "Technical
> Specification" section.
>
> The purpose of that wording is to allow independent implementations
> to exist, and to judge whether such independent implementations are
> conforming ("good") or not. It's not dissimilar to some legal text,
> which allows to judge whether some real-world action is objectionable
> or not. That means, the wording must be self-contained, unambiguous
> and complete.
>
> The end goal is to allow porting of user programs from one implementation
> to another. The user program needs to know which features it can rely
> on, by carefully inspecting the standard wording only.
>
> This e-mail exchange is solely about getting the "Technical Specification"
> as complete and unambiguous and generally high-quality as possible.
>
> Note that implementations (in general) are allowed to add additional
> (defaulted) template parameters, I believe, as a conforming extension.
>
> So, back to "Skipfield". If the "Skipfield" template parameter is
> mentioned
> in the specification anywhere, it establishes a requirement on independent
> implementations to provide such a Skipfield parameter. However, what
> should they do with that parameter? Just ignore it? If you don't tell
> implementations in the wording, they won't know.
>
> In the interest of clarity, let me state I will strongly oppose progressing
> this paper at any level (SG14, LEWG, LWG, WG21, CD, DIS) with such known
> glaring holes in the wording. (My expectation is, though, that simply
> pointing out the hole to LWG will stop it in its tracks right there.
> See https://cplusplus.github.io/LWG/lwg-active.html#3213 and
> https://cplusplus.github.io/LWG/lwg-active.html#3439 where we discovered
> similar holes a bit late.)
>
As noted, part of my submitting this paper to SG14 is to try and find
opinions and solutions to such ambiguities. For the next revision I'll try
to include wording at the front of the tech spec which shows the
explicitness of the multiple-memory block model, capacity limits + the
relationship to the skipfield_type parameter. But in the interests of SG14
and it's members, including embedded, the whole thing is of less worth
without the skipfield_type or capacity limiting capability - partially
because capacity limiting allows developers to customise block sizes for a
specific cache.
>
> > I haven't yet understood the "reverse iterator", --end(), empty
> container,
> > and undefined behavior train of thought. I believe you can only do
> --end()
> > on a non-empty container, as a matter of precondition for that
> operation.
> >
> >
> > I don't think the C++ spec specifies that iterators have to be bounded?
> > But certainly including bounding creates performance detriment and
> should be optional.
> > In a bounded iterator --end() would be non-problematic, in a non-bounded
> would result in undefined behaviour.
>
> What do you mean by "bounded"?
>
Some implementations such as MSVC have optional bounded iterators which
will refuse to go before begin() or after end().
> > I'm going to look into whether or not specifying min/max limits for
> block capacities without changing the capacities of existing blocks can
> have any adverse effects ie. if block 1 is capacity 4, and
> set_block_capacity_limits sets min capacity to 10, whether leaving block 1
> at capacity 4 could have any adverse effects.
> > If not, will re-implement so that it's a simple "set variables"
> operation without any side effects.
> > If so, will re-implement so that set_block_capacity_limits doesn't
> include the reallocation code when the type is non-copyable/movable (via
> constexpr and type_traits).
>
> That seems a step in the wrong direction. If you want to offer an
> interface that does not reallocate, but just sets the new min/max
> parameters (and thus has different requirements on the contained T),
> that's fine.
>
> However, that's a different operation (and thus should get a different
> name)
> compared to an operation that does perform the reallocation and thus
> has a few more requirements on T.
>
The way I've got it implemented in a test implementation, is that it's the
same interface, but if the type is non-copyable/movable, and the container
isn't empty, it returns an error code. It would essentially move the
responsibility from the interface to the programmer to clear() the
container first if they're (a) using a non-copyable/movable type (b) they
want to change the capacities but (c) have already inserted elements. It
would also return an error code in this scenario if the type was
copyable/movable but not empty and it failed in it's allocation of the
memory for the new memory blocks to reallocate to. Alternatively it could
throw instead of returning an error code, I have no strong feelings either
way on that one. Your feeling?
I think it's better to do this than simply set the values and assume that
these are for future block capacities only, as there is no way for the
programmer to know what blocks currently use which capacities, and hence no
way for them to make a decision on that basis to reallocate from existing
blocks.
> > I tried implementing it your way, but the code-smell got bad pretty
> quickly.
> > You end up with code like:
> > plf::colony<int, my_allocator_type>::minmax block_sizes;
> > block_sizes = i_colony.get_block_capacity_limits();
>
> auto block_sizes = i_colony.get_block_capacity_limits();
>
I'm aware of that but my personal bugbear is to try and discourage the use
of auto, it moves the required knowledge of what type you're using from the
code to an external function definition, which isn't great for readability
in my view.
> You could just go with size_t I guess.
>
> Would it be a problem if the outward-facing interface would always use
> size_t here?
> Block sizes > 2^32 should be rare anyway, right?
>
I think so. From my measurements, there's no performance advantage to be
had in almost any scenario from having block sizes larger than 65535, but I
can't predict a potentially better future implementation, so *shrug*.
I'm not sure I like the "runtime failure if overflow" approach here, though.
> As opposed to forwarding the Allocator's size_type in the interface.
>
You could, I suppose, template the struct and supply the allocator direct
to it, but then, there's no 100% guarantees.
> It doesn't - it does a basic merge-sort in all implementations I've seen.
> I pretty much came up with the pointer-sort dereferencing hack (
> https://plflib.org/list.htm#details) - or at least I haven't seen it
> elsewhere - that's why plf::list sorting is 77% faster than std::list (see:
> https://plflib.org/benchmarks_haswell_gcc.htm#sort).
> > From my old reading of gcc/boost posts, I think the libstdC++ people are
> pretty against allocating memory during a sort.
>
> Hm.
>
> Does your "allocate a pointer array" approach in any way depend on the
> internals of std::colony?
>
> If not, maybe the right approach is to offer a stand-alone, self-contained
> algorithm "indirect_sort" or so that can sort any range of forward
> iterators
> and that expressly allocates memory, instead of providing a specialized
> member function just for this single container.
>
So long as that allowed for specializations for a given container, that
might work,
but the main reason std::list has it's own internal sort function is
because it only changes the next/previous pointers, no elements get
reallocated - unlike colony and most other containers.
You can of course do the same pointer technique with the next/previous
pointers of std::list, which is more or less what I did with plf::list, but
that requires a specialization.
> I don't think there are abstract explanations for all the parameters
> which the other containers are able to influence either, at least not to
> the depth you're talking about. Prove me wrong?
>
> Proving a "for all" theorem over prose text is hard, of course.
> Let's turn this around:
>
> If you find an example of an unspecified parameter in the existing
> standard text,
> please tell me about it, because that needs to become an LWG issue
> urgently.
> (For example, std::deque also has a block size internally, but it is never
> exposed in the interface.)
>
Okay, so the precondition for inclusion in the tech spec is that it is
exposed in the interface. Good to know and fair enough.
> > Not really. I think the C++ abstract machine is based on some fairly old
> understandings which don't hold true anymore ie. that time complexity is
> strongly relevant in some capacity (latency, overall performance) which it
> isn't outside of large N or large sizeof(N). That being the case, it's not
> something I should overly concern myself with, as the computers it was
> designed for no longer exist.
>
> The O(N or whatever) complexities given in the standard library
> descriptions
> concern observable behavior of the algorithms, e.g. the number of
> invocations
> of the copy constructor or the number of invocations of a comparison.
>
> You can observe by simply counting the number of calls in your container
> element type T.
>
My point is that the reasoning as to why the standard focuses on these
things seems to be that they have an impact on performance in some way.
If they don't, it's hard to make a case for relevance. Why would you care
about the number of calls, if the number of calls has no discernable impact
on runtime behaviour? Or is there some other overarching reason to record
about the number of calls?
>From my point of view, if the relevance is performance-related then their
relevance is greatly diminished compared to other factors. It seems to be a
gap in vision to not see that the relevance of these things is as
hardware-dependent as almost anything else. The greater the gap between
memory and processor speeds becomes, the less relevance time complexity has
on performance.
> If you believe the C++ abstract machine description is no longer fit for
> its purpose,
> you're very welcome to propose an update to EWG/CWG. Note that the entire
> state-of-the-art of compiler optimizations critically depends on the as-if
> rule formulated in [intro.abstract], so please tread carefully.
> (Allowing to specify something like P1315 secure_clear is in the same
> realm.)
>
My point is not-so-much that it's not useful, just that the necessary gap
between it and realworld machines means that not everything which is
pertinent to allowing a given construct to exist, in a given way, is going
to be expressible within the language of the abstract machine. If cache
locality were to be part of the language of the abstract - that would also
be a mistake. Because it's a realworld constraint which may change in 30
years. See my point above about time complexity. That *was* very relevant
to performance, once.
But that means I can't justify or express everything within the scope of
that language.
All I'm saying is that the "technical specification" doesn't say that
> std::colony's iterators have relational operators, so independent
> implementations won't provide them, so portable user programs can't use
> them. If that's not your desired end-state, the "technical specification"
> needs to say something about relational operators, and probably specify
> their semantics (because you can't use the RandomAccessIterator
> specification framework, obviously).
>
Okay cool, will do.
Thanks again,
Matt
Received on 2020-05-06 19:18:06