ISOCPP std-proposals List: Re: [std-proposals] Relocation in C++

From: Edward Catmur <ecatmur_at_[hidden]>
Date: Tue, 16 Aug 2022 23:17:03 +0100

On Tue, 16 Aug 2022 at 14:25, Sébastien Bini <sebastien.bini_at_[hidden]>
wrote:

> On Mon, Aug 15, 2022 at 1:06 AM Edward Catmur <ecatmur_at_[hidden]>
> wrote:
>
>> On Sun, 14 Aug 2022 at 20:35, Edward Catmur <ecatmur_at_[hidden]>
>> wrote:
>>
>>> Yes, this is what I'm suggesting as try-with-init. The issue is finding
>>> a way to add the initialization into the; the position between the `try`
>>> and `{` of a function-try-block is currently free real estate. Adding it
>>> directly before the base-and-member-initialization list would be ambiguous
>>> both to the compiler and to the reader.
>>>
>>
>> Sorry, this should be the `try` and the `:` (introducing the
>> *ctor-initializer*), of course.
>>
>
> To contextualize, you are suggesting something of the sort:
>
> class T : B
> {
> D _d;
> public:
> T(T rhs) : try (auto guard = MakeGuard(rhs)) /* optionally subobject
> initialization: */ B{rhs}, _d{rhs._d}
> {
> /* reloc ctor body */
> }
> catch (std::exception const&)
> {
> /* catch body */
> }
> };
>

Actually, the `try` comes before the `:`:

T(T rhs) try (auto guard = MakeGuard(rhs)) : B{rhs}, ...

Generally speaking, a try-with-init ctor sounds dangerous. What happens if
> a subobject ctor throws, and the catch block absorbs the exception (e.g.
> does not call `throw` to propagate it). This means you would end up with a
> semi-constructed object, with some of its subobjects left uninitialized?
> It's even more hazardous if the constructor is a relocation constructor as
> some of the source subobjects will not be destructed.
> The exception must be propagated somehow, the caller must know that its
> object was not properly constructed.
>

That's fine; a function-try-block on a constructor cannot swallow the
exception: return is prohibited (
http://eel.is/c++draft/except#handle-12.sentence-1), and reaching the end
of the catch block automatically rethrows (
http://eel.is/c++draft/except#handle-12.sentence-1).

But I don't get why you would need a try-with-init approach: the newly
> introduced object (e.g. `guard`) should clean-up its resources in its
> destructor, which will be called automatically or by stack-unwinding if a
> subobject ctor throws. What you need is, IMO, to have the ability to
> initialize an extra object before the subobjects initialization.
>

try-with-init gives you the opportunity to do something more with the guard
object than just destruct it. For example, you can capture a raw pointer
and then delete it in the catch block.

> The syntax I proposed is not ambiguous to the compiler as the newly
> introduced names must declare their type (so they cannot be data members)
> and name (so it cannot be a delegating/main class ctor call):
>
> T(T rhs) : auto guard{MakeGuard(rhs)}, B{rhs}, _d{rhs._d} {}
>

Sure, but it would be quite easy to write this (or something that looks
like it) by accident, without intending for it to be a guard object
declaration. It's a lot nicer as a compiler author to know immediately
that you're parsing a try-with-init guard object definition, as opposed to
a misspelled base-or-member initializer.

I admit it may be confusing to the reader. But as long as we agree on the
> solution, we are free to come up with any, clearer syntax we want:
> T(T rhs) : [guard = MakeGuard(rhs)] B{rhs}, _d{rhs._d} {}
>

Absolutely. But try-with-init has the advantage that it doesn't even have
to be a guard object, since the release code can go directly in the catch
block.

class T
>>>> {
>>>> public:
>>>> // [...]
>>>> T& operator=(T rhs) = reloc; // same as { std::destroy_at(this);
>>>> return *std::relocate_at(&rhs, this); }
>>>> };
>>>>
>>>> This declares a prvalue assignment operator, and defines it with a
>>>> special implementation. We do not use = default; as users may expect
>>>> memberwise relocation which is not what's happening. This = reloc;
>>>> also allows us to make it a special function so it can share similarities
>>>> with the reloc constructor: the destructor of rhs is not called at caller
>>>> site, and rhs is not actually passed by value to the operator=() despite
>>>> its signature, but by reference. This allows us to make the same set of
>>>> optimizations as with the relocation constructor.
>>>>
>>>> The destroy + placement new approach is considered safe. Since rhs is a
>>>> prvalue (or behind the scene, a reference to a prvalue), its address cannot
>>>> be (by construction) this or any subobject of *this. As such
>>>> std::destroy_at(this); cannot destruct rhs as a side effect.
>>>>
>>>> If an exception leaks through this reloc-assignment operator then *this
>>>> will likely be in a destructed state (the exception was either thrown by
>>>> std::destroy_at(this), leaving the object in a somewhat destructed state,
>>>> or by std::relocate_at which will have failed to reconstruct *this
>>>> properly). That destructed state may have disastrous consequences if *this
>>>> is an object with automatic storage whose destructor will inevitably be
>>>> called at some point (probably even during the stack unwinding incurred by
>>>> the exception leak), which will result in calling a destructor on a
>>>> destructed object. For this reason, I suggest that declaring this
>>>> reloc-assign operator yields a compilation error if the destructor and the
>>>> relocation operations are not noexcept. Note that this forcefully renders
>>>> the reloc-assign operator noexcept. Also, std::relocate_at needs to be
>>>> valid so the type must have a copy, move or relocation ctor.
>>>>
>>>> I further suggest adding an extra declaration: T& operator=(T rhs) =
>>>> reloc(auto); which will silently ignore the reloc-assignment operator
>>>> declaration if the above requirements are not met (useful for template
>>>> code).
>>>>
>>>> Note that if a class does not provide a reloc-assignment operator,
>>>> instructions such as `x = reloc y;` are still possible. reloc merely
>>>> transforms y into a prvalue. Overload resolution will simply pick the
>>>> move-assignment operator or copy-assignment operator instead if provided.
>>>>
>>>
>>> I'm not too sure. The thing is, this isn't too difficult to write by
>>> hand; even the exception safety stuff can be done with static_assert and
>>> requires (for reloc(auto)).
>>>
>>
> You can write it by hand, but you cannot write by hand that the source
> object is not copied, nor that its destructor is not called. The = reloc
> provides that magic bit for us, as it does for the relocation ctor.
>

Hm, but you can write:

T& operator=(T rhs) { std::destroy_at(this); return *new (this) T(reloc
rhs); }

If we permit operator=(T), as a special member function, the same aliasing
power as the relocating constructor, then it invokes precisely one
destructor call and one relocating constructor call, which is perfect. If
it doesn't have aliasing on its prvalue rhs parameter (admittedly that
might be tricky from an ABI standpoint), then there's potentially two calls
to the relocating constructor, but they can be elided together.

But also, there's a safer way to implement relocating assignment, which is
>>> (as always) copy-and-swap:
>>>
>>> T& operator=(T rhs) { rhs.swap(*this); return *this; }
>>>
>>> We should be encouraging the use of copy-and-swap idiom (well, without
>>> the copy) rather than the unsafe destroy-and-relocate. The only problem is
>>> that there isn't a way to automatically generate a memberwise swap...
>>>
>>
> I agree that having =reloc using destroy-and-relocate may encourage users
> to use that idiom in places where it is not necessarily safe. Copy-and-swap
> is always safe although not as optimized in this precise use case...
>
> I believe it is of prime importance to have the reloc-assignment operator
> to do exactly as intended. That means in `y = reloc x;`: we want the
> resources of `x` to be (mem)moved to `y`, `y` being cleaned beforehand, and
> to skip the destructor call on `x`. copy-and-swap does not quite fit in
> here in my opinion.
>

It's not quite as tidy, but copy-and-swap has that ironclad safety
guarantee that's reassuring in library code. Ideally, library code should
be manifestly correct; and user code should be rule-of-zero so can use
memberwise relocating assignment (by default). So I am not sure that
destroy-and-rebuild will actually find much use in practice to make it
worth adding syntax for it. Getting some implementation experience and the
numbers on it could swing things.

"As intended" is tricky, because you can follow it with "but what if...".

> What feels unsafe about destroy-and-relocate is that we destruct an object
> within a member function. We can tweak things around and turn the
> reloc-assignment operator into a static function behind the scene:
> T& operator=(T src) = reloc; // same as: static T& assign_reloc(T& dst, T&
> src) { std::destroy_at(&dst); return *std::relocate_at(&src, &dst); }
>
> Here we no longer have the scary std::destroy_at(this), and we avoid
> self-destruction in a member function (which may be an UB?).
>

That's still destroying an object from within its member function, just not
directly. It's fine, though; `delete this` has always been legal, it's
just never been a sign of good code. One library I use regularly
(rapidjson) uses destroy-and-rebuild for (lvalue) assignment, and it works
fine - as long as you sort out exception safety.

I fear that if we fallback to copy-and-swap then determined users will rely
> on dirty hacks to use some unsafe, home-made destroy-and-relocate instead.
>

But if they're that determined, they'll ignore the guardrails on the
compiler-implemented version and write their own unsafe destroy-and-rebuild
anyway (since the builtin one won't compile if it's unsafe). Better to
nudge users towards a copy-and-swap implementation that always works.

Received on 2022-08-16 22:17:16