C++ Logo

sg14

Advanced search

Re: [SG14] New colony proposal - feedback requested

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Sat, 2 May 2020 21:39:01 +0200
On 02/05/2020 07.05, Matthew Bentley wrote:
> First of all, thanks Jens for the thorough overview - please consider any points I don't comment on below as accepted criticism and as things I will action upon:
>
> On Fri, 1 May 2020 at 22:21, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>
> - Please suggest stable labels and a location in the standard
> where the additional container should go.
>
>
> When you say "stable labels", do you mean numbered headings as in "26.3.8.5 deque specialized algorithms".

All sections in the standard have labels, e.g. [basic.def.odr].
Please make a suggestion for each of the (sub)sections you add.

Also, please give a pointer where in the (numerical) order
of sections you want your new sections about colony to
appear.

> - The "Skipfield" template parameter is not specified.
> Which restrictions apply? (For example, can it be void,
> an array of unknown bound, an abstract base class, a function
> type, or a non-copyable class type?)
>
>
> I don't think it's necessary to leave it anything other than implementation-defined? The reference implementation must use unsigned integral types, but I don't want to obviate future potential improvements on implementations, that could, for example, use a floating-point type (somehow).

If this is not a type that the user can/should influence, then it
shouldn't be a template parameter for colony in the first place.

> - Would it be useful to give the default template argument
> for Skipfield a user-visible name? We also use the public
> name std::allocator for the default argument of Allocator etc.
>
>
> Since the type is meant to be implementation-defined, I'm unsure how to do this.

"implementation-defined"? What benefit does the user gain from
knowing what type it is?

> - "const size_type min/max_block_capacity"> drop "const"; it's ignored in a non-defining function declaration anyway.
>
>
> There's plenty of use of const in the existing standard, for containers, not sure what is different about what I'm doing. For example:
> "iterator insert(const_iterator position, size_type n, const T& x);"
> under 26.3.8.1 Class template deque overview.

const T& is "reference to const T", which is not a top-level const.
However, "const size_type" is a top-level const, and those
get ignored when forming function types. See [dcl.fct] p5.
The definition of "top-level cv-qualifier" is in
[basic.type.qualifier] p6.

> - The synopsis should show "assign" member functions near "operator=".
>
>
> There are no assign member functions. I was following the path of other unordered containers by not including them. Is there an obvious reason for including them? Implementation would be clear();reserve();insert().

The existing unordered_* containers are associative containers.
In contrast, "colony" seems very much like a deque, which does
have assign member functions (because the memory allocated for
the blocks might be reused).

> - " // Since this could be derived from --end() depending on implementation, and that could result in undefined behaviour in a colony with no insertions and unbounded iterators, whether this is noexcept is implementation-defined"
>
> "Since" is rationale that doesn't belong in normative text. Do you want to say
> noexcept( see below ) , or what?
>
> Why is "rbegin() const" not so marked? Same question for crbegin().
>
>
> I assumed the reader would be able to extrapolate from one to the others. Plus it's a large block of text. Maybe should go in a footnote and use a reference marker.

"extrapolate" makes no sense in a specification intended for an
ISO standard. If you want to say something, you need to say it
explicitly.

In particular for noexcept, we want to be rather conservative.
So, if you don't want to prescribe "noexcept" for all implementations,
this should never be noexcept.

> Why do we need this funny restriction anyway? Can the implementation
> specialize reverse_iterator<colony::iterator> to do the right thing and
> avoid the undefined behavior for rbegin() with no runtime cost and no
> insane code cost? If so, that should be required, not implementation-defined.
>
>
> The only way to avoid that behaviour is to store a permanent reference to the current back element in-code, which would need to be updated in many places, and that's costly. I don't like reverse iterators for performance reasons and I personally wouldn't ever sacrifice overall performance to obtain an undefined-behaviour free function for rbegin. I want to give implementors the same ability. Frankly, as an unordered-insert container, the relevance of reverse_iterators is sparse at-best.

So, maybe you don't want to provide reverse iterators
at all, since clients shouldn't really rely on iteration
order anyway.

> - In general, the standard doesn't do "get_" and "set_" prefixes for getting/setting values.
> Instead, we have
> name_for_value(x) to set a value
> name_for_value() to retrieve the value
>
>
> I don't think that's clear, but okay.

The story is a bit different if there's a lot of work involved
with the "set" part.

> If you need to return several values, use a struct with nice member names.
>
>
> ... Why. How is creating an arbitrary struct type, having the user instantiate one, and then have to get rid of it, any better than passing-by-reference. It's a waste of time for both the implementor and the user.

You can use structured bindings in the caller, which makes your call a lot
more compact plus gets you the right lifetimes for the variables:

auto [min,max] = my_colony.block_parameters();

Plus you have at least a chance of avoiding allocating
memory for min and max, which is a lot harder with
a reference.

I don't care for the (small) implementation effort.
See the range-based algorithms in clause 25, where this
pattern is rather pervasive.

> - similar for get_index_from_iterator; Why can't we use std::distance(begin(), it)
> instead?
> (We have "Complexity: Same as distance specialized algorithm, see below.", so there
> seems to be no benefit.)
>
>
> If you read the implementation

I don't care for the implementation. I care for the specification that aspires
to be in the standard.

> you'll find there's a lot of code you can get rid of when you have the assumption of counting from begin(), which makes for a faster function. Whether an implementor decides to act on that is up to them.

If that's so much of a win, can't you just run-time check whether the first
iterator in std::distance is begin() ? Or is the "begin" information lost
at that point?

> ctor p6:
>
> "If n is larger than min_block_capacity, the capacity of the first block created will be the smaller of n or max_block_capacity."
>
> If you want the min/max block capacity to have any normative impact,
> you need to establish an actual abstract model you can tie into.
> Otherwise, why does the capacity of some block matter at all?
>
>
> Because the length of the block affects potential cache locality. The abstract model is described in Design Decisions.

As I said, I'm only reading the specification. And this "block capacity" business
appears in the specification without an overarching model that would make sense to
me. That needs to be fixed (e.g. by adding a paragraph or two after the class synopsis),
or any mention of "block capacity" needs to go.

The C++ abstract machine has no way of expressing "cache locality", btw.

> - Why do we need MoveInsertable for "reserve" anyway? Aren't we just allocating
> a new memory block of the desired size, without any moves?
>
> That's an open question that I want addressed, so thanks for bringing it up - see 'questions for the committee' section.

Fine. Answer: Let's have "reserve" only handle memory pre-allocation.
If you want compaction or rearrangement of the elements, provide a different
function (such as the set(min,max) function you already have).

> "Complexity: Implementation-defined. One strong strategy involves using the standard library's sort function to sort an array of pointers to colony elements by the value of the elements they point to, and would have std::sort()'s complexity."
>
> This is, of course, idle chatter. If there is a reasonable implementation
> strategy that is O(n log n), we should simply require that complexity and be
> done with it.

> That also is an open question. See Questions for Committee section.

Any sorting near O(n^2) is not worth talking about, so you should simply
prescribe O(n log n) with a "remarks" paragraph that this might allocate
memory (which might cause std::bad_alloc to be thrown).

> - approximate_memory_use p19
> "Complexity: Implementation-defined. Should typically be constant."
>
> If there is a reasonable implementation strategy to make this constant, then make it so
> in the spec.
>
>
> Constant relies on the same code (maintaining sequential memory block numbers or alternatively having vector of memory blocks rather than linked-list) that allows operators >/</>=/<= to exist. If the committee decides that they don't want those, you would have to maintain those just to allow for constant implementation of this function, which I'm not in favour of. However I see benefit to retaining >/</>=/<= , so adopting a wait-and-see attitude at this point.

I'm not seeing any mention of relational operators in the specification section,
so they don't exist. Also, there is no related "question for the committee".

> - Why is reinitialize needed? Can't you just do my_colony = colony(min_block_size, max_block_size) ?
> Oh, we don't have that constructor, but it seems we should definitely have it.
>
>
> That constructor is not possible as it creates overload conflict with the fill constructor when using integers.

Well, if you make a struct for min/max for the get_parameters() function,
you can use that struct for the constructor as well, avoiding the
ambiguity.

> - get_iterator_from_pointer p1: "pointer is not invalid" : The name of the parameter is element_pointer
> (which is a bit verbose, though, just call it "p"). I think we need a precondition that the pointer
> points to an element of *this.
>
> - In general, I think we need to be more clear which operations invalidate iterators,
> and we should do so in an overall introductory paragraph of the normative wording.
>
>
> This is in the first appendix, but I'm happy to move it upwards if you can name the section you think it should be in.

It should be right after the class synopsis.

> and of course the lengthy explanation of the complexity doesn't belong here. Just say O(n)
>
>
> Where? You mean in the technical spec? Or in the paper in general?

As I said, I'm only reading the specification part of your paper. Any
explanations (which are good to have, in general) should go somewhere
above, maybe near "Design Decisions".

Jens

Received on 2020-05-02 14:42:09