ISOCPP std-proposals List: Re: [std-proposals] Relocation in C++

From: Edward Catmur <ecatmur_at_[hidden]>
Date: Mon, 30 May 2022 10:26:40 -0600

On Mon, 30 May 2022 at 05:38, Sébastien Bini <sebastien.bini_at_[hidden]>
wrote:

> *About the synthesized relocation:*
>
> > Yes, that's exactly what I'm suggesting. Composition is necessary - here
> this means being able to combine a relocate-only and a move-destroy type as
> data members of a class, and have that aggregate remain usable.
> >
> > What problems do you foresee?
>
> I like that users don't have to explicitly call the destructor on the
> moved part, should they mean to write the initializer explicitly. But I
> feel mixed about this.
>
> Let's consider the following scenario:
>
> struct DB
> {
> DB(DB&&) = default;
> void start();
> void end();
> };
>
> struct Transaction
> {
> Transaction(Transaction&&) = default;
> explicit Transaction(DB& db); // calls DB::start
> ~Transaction() noexcept; // calls DB::end unless *this is in
> moved-from state
> };
>
> struct T
> {
> DB _db;
> Transaction _trans;
>
> operator reloc(T&& src) = default;
> };
>
> DB and Transaction are not relocatable as their operator reloc() is
> implicitly deleted. As such T's operator reloc will use synthesized
> relocation (move+destroy) from DB and Transaction. But it is important
> that, in operator reloc(), members are constructed in the order they are
> declared, and source members destructed in that reversed order.
>
> This means that operator reloc() will:
>
> 1. Initialize each base class and data member in declaration order
> (_db and then _trans).
> 2. Initialization is done either with trivial relocation, operator
> reloc(), move constructor, or is user-provided.
> 3. Right after the initialization is done, and just before the
> operator reloc() function body is entered, the necessary destructors are
> called. The destructor of all base classes and data-members that got
> initialized by move or user code is called on the source object in the
> reversed declaration order (src._trans and then src._db).
>
> Now we pray that src._trans destructor will have no side effect on src._db
> (normally it shouldn't as it knows it has been moved-from). But we are
> walking on thin ice here. If DB had an operator reloc(), then src._db would
> be considered destructed before src._trans destructor is called (which is
> unusual in C++). If it had any effect on src._db, then it would end up
> modifying a destructed object.
>

The defaulted operator reloc can't work here anyway, since Transaction
needs to be informed that DB has relocated.

So Transaction must already have a ctor Transaction(Transaction&&, DB&) and
the relocator of T must invoke that ctor:

T::operator reloc(T&& src) : _trans(std::move(src._trans), _db) {}

Or, if Transaction has its own operator reloc, then it must in addition
have a rebind(DB&) member function, and the relocator of T must invoke that
function:

T::operator reloc(T&& src) { _trans.rebind(_db); }

T is committed to do one of these, since it is responsible for maintaining
the relationship between DB and Transaction.

An alternative is that DB keeps a list of pending (bound) transactions and
automatically causes those to rebind when DB is moved. This is
significantly safer, and not excessively difficult to implement. Or,
possibly, DB can use pImpl idiom and Transaction can store a reference to
that heap-allocated DB::Impl object. In either of these cases, the
defaulted operator reloc will work fine.

*About the exception safety:*
>
> I admit there are no strong arguments against proper exception handling.
> Here is what I can come up with:
>
> Operator reloc() acts in three stages: (a) the base classes and data
> members initialisation, (b) destruction in reversed declaration order on
> any subobject of the source object that did not get true relocation
> (synthesized relocation or user provided initialization), (c) the function
> body.
>
> If an exception leaks through in stage (a) then:
>
> 1. in reversed declaration order, call the destructor of all
> initialized subobjects.
> 2. in reversed declaration order, call the destructor of the following
> source subobjects:
>
>
> - all subobjects that did not get relocated yet
> - the subobjects that would get destructed in stage (b)
> - if the initialization that threw did not happen through true
> relocation, then the matching subobject. (If the initialization happened by
> operator reloc() then we know that the source subobject is considered
> destructed.)
>
> We call the destructor on the new instance subobjects first as they were
> constructed more recently.
> If any extra exception leaks through during those destructor calls, then
> std::terminate is called (as it is done through stack-unwinding).
>
> If an exception leaks through in stage (b) then:
>
> - All subobjects of the new instance are destructed in reversed
> declaration order.
> - All the remaining destructors of the source subobject are called in
> the same order.
> - If any extra exception leaks through during those destructor calls,
> then std::terminate is called (as it is done through stack-unwinding).
>
> If an exception leaks through in stage (c) then:
>
> - All subobjects of the new instance are destructed in reversed
> declaration order.
> - If any extra exception leaks through during those destructor calls,
> then std::terminate is called (as it is done through stack-unwinding).
>
> Last, delegating constructors with operator reloc() should also be
> allowed: `operator reloc(T&& src) : T{std::move(src)} {}`.
>
> - In that case `src` is destructed right before the function body is
> entered.
> - If an exception leaks through the delegating constructor, then `src`
> destructor is called normally and the exception is propagated:
> - The new instance does not need to be destructed as the delegating
> constructor already took care of that.
> - If an extra exception leaks through the destructor of `src` then
> std::terminate is called.
>
> All this looks nice, but I can't help to think of what would happen when a
> destructor is called on an unmoved subobject of the source that has a
> side-effect on an already relocated subobject. (see below)
>
> *Exceptions and synthesized relocation:*
>
> What happens now if someone adds an "operator reloc() = default;" to DB
> (or worse, if this is done automatically should DB be trivial)? Let's
> rewrite T:
>
> struct Throw
> {
> operator reloc(Throw&&) { throw std::runtime_error{"oops"}; }
> };
>
> struct T
> {
> DB _db;
> Throw _thrower;
> Transaction _trans;
>
> operator reloc(T&& src) = default;
> };
>
> In T's operator reloc():
>
> 1. _db is initialized using its own operator reloc(). src._db is then
> considered destructed.
> 2. _thrower will thow. Using the rules described above, operator
> reloc() needs then to destruct this->_db and then src._trans, before
> propagating the exception.
> 3. this->_db.~DB() is called without any issue.
> 4. src._trans.~Transaction() is called while src._trans is not in its
> moved-from state, so it will act on an object that theoretically reached
> its end of life (call src._db.end() while src._db is destructed). Ouch.
>
> This is even more painful when we know that T default move constructor
> does the right thing: when the exception is thrown, this->_db is
> destructed. src continues to live as normal, even though src._db is in a
> moved-from state. Given that DB can handle a call to DB::end() in a
> moved-from state, no problem will arise.
>
> We could argue that T should not be relocatable and that it's all bad user
> code. But still I wonder how much synthesized relocation and exceptions may
> break things...
>
> On the other hand, we could state that synthesized relocation does not
> happen, i.e. defaulting operator reloc() does not fallback on move+destruct
> on unrelocatable subobjects. operator reloc() would then be implicitly
> deleted if any base class or data-member has a (potentially implicitly)
> deleted operator reloc().
>
> Then users will be forced to write their own operator reloc() which has at
> least the benefit of questioning them on whether their class should be
> relocatable to begin with, and if yes, how to do it properly.
>

This is the same issue as before: T::T(T&&) = default is incorrect, since
Transaction must be informed that DB has relocated, unless (as above) DB is
aware of Transaction or is pImpl. It is unreasonable to expect the default
operator reloc to work in a circumstance where the default move constructor
does not.

Assuming that either of the above safety improvements is made, such that
T::T(T&&) works, we must conclude that the exception cleanup destroys
src._trans before dest._db.

If neither of the above safety improvements obtains (perhaps Transaction is
a third-party bolt-on) then the move constructor T::T(T&&) must unbind
Transaction before invoking DB::DB(DB&&), probably via constructor
deferral. If all the subobject types are movable, then T::operator
reloc(T&&) can also defer to a constructor, as you define above. However,
if some data member is immovable, we will need some other mechanism to
unbind Transaction before relocating DB, which is why I suggest adding an
init-statement to occur before the ctor-initializer in an operator reloc
function-try-block.

Received on 2022-05-30 16:26:52