ISOCPP std-proposals List: Re: [std-proposals] Relocation in C++

From: Edward Catmur <ecatmur_at_[hidden]>
Date: Sun, 18 Sep 2022 16:26:53 +0100

On Fri, 16 Sept 2022 at 11:05, Sébastien Bini <sebastien.bini_at_[hidden]>
wrote:

> Hi Edward,
>
> Thank you for those explanations. I'll get in touch with people from AFNOR
> when we have a draft ready (ongoing).
>
> There is still one technical aspect that we haven't talked about:
> structured binding relocation.
>
> I believe that, in order to support relocate-only types, we need to enable
> relocation from a structured binding: `auto [x, y] = foo(); sink(reloc y);`
> must work. This is motivated by:
>
> - The need to make APIs that support relocate-only types. How do you
> write an API to extract an item at an arbitrary position from a vector? I
> suggest the following: `std::pair<iterator, T>
> vector<T>::pilfer(const_iterator);` (returns next iterator and relocated
> vector element) as it is consistent with other vector APIs and complies
> with the core guidelines
> <https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-out>.
> Then, what can users do with the returned object as it lies in a pair, and
> that it is forbidden to relocate only parts of an object? The return value
> is unusable for relocate-only types, unless we support structured binding
> relocation.
> - Almost all C++ developers that I know believe that a structured
> binding is complete, separate object, and not a name alias to some
> subobject. As such they would find it weird that they cannot relocate a
> structured binding.
>
> Yes, all good points, and I agree this would be very useful.

> This is not a blocking point IMO, but will be very nice to have. vector's
> pilfer can be worked around by rotating the element to the end of the
> vector and calling some new vector::pilfer_back method that removes and
> returns the last element. There is no performance overhead, it is just
> cumbersome to write. Likewise we can just state that we cannot relocate a
> structured binding and move on with the proposal. But I'd like to give it
> more thoughts: structured binding relocation sounds like a very convenient
> facility, and after all, how hard could it be to just relocate one item of
> a pair? Well it's not simple :/
>
> Before we dive in, let's agree to reuse the terms used in
> https://en.cppreference.com/w/cpp/language/structured_binding . As such,
> when we write: `auto [x, y] = foo();` it creates a hidden object `e` of
> type `E` (which is the return type of `foo`), `x` and `y` are newly
> introduced identifiers that are just aliases to some parts of `e`.
>
> The best approach I can think of is that under some conditions, `x` and
> `y` are not aliases but individual complete objects, and no hidden object
> `e` is created. Hence we can write `reloc x` without violating the rules we
> added (which forbids relocation of subobjects). Here are the conditions:
>
> - there must be no ref qualifiers in the auto declaration (auto [x, y]
> = foo(); passes, auto const& [x, y] = foo(); doesn't)
> - `E` must not have a user-defined destructor, a user-defined
> relocation, move or copy constructor. This should give us the guarantee
> that we can safely split an object into individual objects. Thankfully for
> us, std::pair and std::tuple comply.
>
> That part is fine by me. The question is then how do we construct each
> individual object that corresponds to the new identifiers we introduced? We
> need to distinguish between the three binding protocols supported in C++17:
>
> - C array: the easiest of the three, each object is constructed
> element-wise using the best-suited constructor. That approach remains
> compatible with relocation.
>
> Yes. Interestingly, you don't see array prvalues much, since they can't be
returned from functions, but they do exist as temporaries (auto&& r =
std::type_identity_t<int[2]>{0, 1} constructs a temporary array and binds
it to a reference).

>
> - Binding to data members: again, no real difficulty here, each object
> is constructed using the best-suited constructor from their corresponding
> data member. Again that approach remains compatible with relocation.
>
> Yes, this is fine.

>
> - Tuple-like binding: this is the hardest of all, and probably the one
> we need to support the most. The main problem with tuple-like binding is
> that the language doesn't treat std::array, pair and tuples as specials.
> They are just types like any other, hidden behind the tuple_size,
> tuple_element and std::get APIs. We can construct each object using the
> member get (or std::get) functions, and that will work fine for copy
> constructor and move constructor, but not for relocation. First, the get
> functions will return references and not prvalues, so the relocation
> constructor will not be selected. Second, the get functions, had they
> returned by value, will not be allowed to relocate the subobject they
> return. Last, we have no warranty that tuple_size and all the get functions
> return a partition of `E`. It is mandatory to enable construction of the
> new objects (`x` and `y`) by relocation, as again, the whole point of this
> is to support relocate-only types (so `x` or `y` may be relocate-only).
>
> The only solution that I can think of is an alternative API for the
> tuple-like binding protocol. If that new API is not provided then we
> fallback to C++17 structured bindings that cannot be relocated. The API I
> think of is:
>
> - A template static member function get_member that returns the
> pointer to I-th data-member: `template <std::size_t I> auto
> E::get_member<I>()`
> - Or a get_member free function that does the same: `template <class
> E, std::size_t I> auto get_member()`
>
> The return type of the get_member function must be:
> `std::tuple_element_t<E, I> (E::*)`. The number of supported get_member
> must match that of tuple_size<E>::value. For std::pair, that would be:
>
> template <class First, class Second>
> constexpr auto get_member<std::pair<First, Second>, 0> { return
> &std::pair<First, Second>::first; }
> template <class First, class Second>
> constexpr auto get_member<std::pair<First, Second>, 1> { return
> &std::pair<First, Second>::second; }
>
> Let's consider the expression: `E e = foo(); auto [x, y] = reloc e;`. If E
> provides the new API then the new objects (`x` and `y`) can be constructed
> by relocation. The language needs to track, if any relocation constructor
> is used, which subobjects of the source (`e`) got destructively relocated
> into a new object of the "structured binding". Then the destructor of the
> source object is not called directly. Instead the language will call the
> destructor on all the subobjects of the source that were not relocated (we
> need to keep in mind that relocation may not happen for every data-member,
> and that some data-member may be hidden from the tuple-like binding API).
> Hopefully, thanks to the pointer to data-member returned by the get_member
> functions, the language is able to track down which parts are relocated and
> which aren't.
>
> I thought of just taking the address of whatever std::get returns, instead
> of introducing a new API. However I don't believe we have any guarantee
> that std::get returns a reference, and even a reference of some subobject
> that lives within the tuple-like object (it could very well be a reference
> to some static variable...). get_member solves both those problems.
>
> I am not sure this is the best solution there is, I am just sharing my
> thoughts about the subject. If get_member seems viable, we can easily
> provide an implementation for pair and tuples. I wonder what can be done
> with std::array (we cannot take the pointer to data-members that are array
> elements, although the language considers array elements as subobjects...).
>

Yes, I agree this would be nice to have, but as well as the problems you
have noticed with `get_member`, there is also the problem that pointer to
data member of a nested (not base class) subobject cannot be formed, and
also that there could be tuple-like classes that form some elements "on
demand".

I have another idea, which is to use recursion in the manner of
operator->(). For example, let's say that structured binding for
tuple-like class types first looks for
`get<std::make_index_sequence<std::tuple_size<E>::value>>(reloc e)` (the
exact syntax isn't important, but it should be something clearly novel and
`e` should be prvalue if the original object was). Then, whatever this
call returns (which must have the same `tuple_size` as `E`) is in turn
submitted for structured binding (as a prvalue).

Of course, library authors will then need to create a struct type with the
appropriate number of fields, but it won't be difficult for compiler
authors to write an intrinsic to do that, and once it's available for
std::tuple and std::array then everyone else can just recurse to those
types. Anyone who can't/ doesn't want to recurse to the Library can use
preprocessor hackery, other code generation, or possibly metaclasses (once
those are available), to support any sensible number of elements.

It'd be ugly to convert `std::array<T, N>` to essentially `struct { T t0,
t1, ...tn-1; };` but N should be small (at least until we get variadic
structured binding) and compiler authors can always hack in a better
solution for privileged Library classes as long as the general case can be
made to work for non-Standard library authors. A more ambitious solution
would be to allow returning arrays from functions.

For example, one could write:

template<std::size_t... I, class... T> requires
std::same_as<std::index_sequence<I...>, std::index_sequence_for<T...>>
auto get<std::index_sequence<I...>>(my_tuple<T...> t) {
union { my_tuple<T...> tt; } = {.tt = reloc t}; // prevent destructor
return std::tuple<T...>(std::relocate(&get<I>(tt))...);
}

>

Received on 2022-09-18 15:27:06