std-discussion: DestructorConstructor - a class destructor with parameter

From: Maciej Polanski <maciejpolanski75_at_[hidden]>
Date: Wed, 10 Feb 2021 22:52:07 +0100

Hello folks,

I would like to present a (revolutionary?) concept of
"DestructorConstructor", a class destructor with parameter -
uninitialized object. Such destructor and constructor in one, while
destructing its (own) object, moves data to new (parameter) object with
cost of "C" styled "memcpy".
I've been thinking about this concept for some time. I have trouble
completing it, but I think it is sufficiently valuable to present it to
community... even if only to be shown I am a moron :)

//*** Introduction

Simple use case, I think it gives more than an any elaborate description:

//------
class C {
     int *someData;
Public:
     C() { someData = new int[2]; }
     ~C() { delete someData; }

     ~C(C* unitializedTarget) { unitializedTarget->someData = someData;
} // <- DestructorConstructor here
};

(...something...)

unsigned char a[sizeof(C)]; // Old buffer
unsigned char b[sizeof(C)]; // New buffer

C *cA = new (a) C;
C* cB = reinterpret_cast<C*>(b);

cA->~C(cB); // <- DestructorConstructor used here

// cA was destroyed, so only cB needs cleaning now
cB->~C();

//------

So class data are moved from cA to cB with cost of pointer copy only.
Without stuffing cA's member pointer with zero and then spare destructor
call, as it would happen if move semantics involved.

//*** Needed by vector containers

Such advanced change needs good reasoning, so let me present how did I
found that we need this.

So some time ago I watched a presentation from CppCon - unfortunately, I
can't find it now - that showed that, for some class of applications,
significant problem was (presumably) a time of reallocating of buffer of
std::vector while it grows. This was about some simulation, where it was
impossible to predict which of vectors will rise in size, thus
approaches like "std::vector::reserve" couldn't be employed and vector
sizes sometimes could become enormous. Solution used back then was a
reusing vectors and this sped up simulation like 2 times - my guess it
was due to preserving allocated space.

But a problem with reallocation of huge memory patches was generally
solved long time ago by "memory mapping". I was even wondering before
about creating a "mmap'ped vector" that could use Linux system calls
"mmap" and "mremap" to create vector growing in constant-time, without
need to move objects. Or "treadmill deque" that would use a linear patch
of memory, just moving its content with "mremap". For an OS that does
not offer such capabilities, implementation could be just forwarded to
std::vector. So such "mmap'ed vector" could help in cases when
std::vector growth in size is a significant and unpredictable problem.

Some of you can be terrified now by an idea of gluing a mmap sysscalls
in "mmap'ed vector" with stdlib "new" allocated memory in other parts of
program. And even putting it into standard! But this is how does it work
on Linux now. As most of you probably do not know, and I found it to my
surprise some time ago, standard "C" memory allocation that lies below
all "new", uses "mmap" to receive a memory from system. Initially
cstdlib "mmap's" huge patch of memory for all small allocations. But
bigger request, AFAIK >200k (but do not cite me, I also met an info that
threshold is 64K), are just separately "mmap'ed" to their own patches of
memory.

So huge vector reallocation internally looks like: having "mmap'ed"
memory buffer, (1)"mmap" next bigger buffer then (2) move everything,
then (3)"munmap" old one.
But hypothetically it could be done following way: having "mmaped"
memory buffer, (1)"mremap" it to be bigger. End, no move needed, most
likely constant time operation.

Here however lies some tremendous problem: C++, in its backwardness, is
still stuck in a world of linear memory model (rooted probably in
Turing works in the '40). Objects are absolutely tied to its memory
addresses, so calling a constructor for each new location is an absolute
must.

So I have first to address this issue and define a rules and conditions,
that allows object to change its address. And this is what this whole
proposal is mainly about.

//*** Concept of "TriviallyMovable"

Logic I propose is as follows:

1. To define DestructorConstructor "~C(C* unitializedTarget);" that
constructs uninitialized object by moving data from old one, that is
being destroyed. No need to stuff pointers inside old object with zeros,
as no any additional destructor will ever be called later - this
destructor fulfills requirement for old object being destructed (and new
created).

2. Default implementation "~C(C* unitializedTarget) = default;" should
just copy all data, as this will work also for pointer fields, including
smart pointers. So covers about 99% existing classes, except ones having
really exotic side effects. Such approach should work as nothing bad in
making copy of pointer, if original one is abandoned without delete.

3. If class have default DestructorConstructor and all its
fields-objects have a default DestructorConstructor too, then class
should have trait "TriviallyMovable". Moving objects of such class is
just making copy of memory and abandoning old object as considered
destroyed. So it should be officially allowed to replace calls to
DestructorConstructor of such object and recursively its fields, with
one big "memcpy".

So. The last one have very interesting consequences for
"TriviallyMovable" classes:

1. Moving objects and especially tables of objects, like while
reallocating std::vector, can be officially optimized to just "memcpy"

2. This indirectly legalizes other forms of moving objects, like
"mremap". Also, indirectly, legalizes casting pointers to
binary-initialized memory

3. This could also open interesting optimization opportunities for
compiler, as it may officially allow - with additional conditions, off
course - to memcpy objects or just create it directly in target
location, skipping spare copies

//*** Concept of uninitialized types

To make this all work smoothly, one more concept should to be
introduced: uninitialized class. There should be a modifier to a type,
that creates an object without implicit call to constructor at start,
nor destructor at the end of life. Such object shall work as normal
variable, but responsibility to maintain lifetime is put on programmer,
not compiler. I propose use of tilde "~", symbol of destructor, to
declare uninitialized class, but this is not binding.

Having uninitialized objects is not a must for DestructorConstructor,
but limits use of reinterpret_cast, malloc's etc.

And do not treat this as something new in C++!

This is just a new intermediate layer between normal declaration and
"malloc(sizeof(class))". This, in fact, tightens the type checking for
previously "malloc'ed" buffers. As we all know, C++ is like Ogres - has
layers. Depending on needs, one can use high level of abstraction or go
deeper. So uninitialized type adds layer where lifetime of object is
manually managed, but less hardcore in use than raw "malloc" memory.

Use could be like this:

//------
class C {
     int *someData;
Public:
     C() { someData = new int[2]; }
     ~C() { delete someData; }

     // Comparing to previous declaration, I added "~" as information
     // that "not initialized" object is expected
     ~C(C~ *unitializedTarget) { unitializedTarget->someData = someData; }
};

(...something...)

C~ cLocal; // No constructor called as initialized type
modifier "~" used
C *pcSrc = new C();
delete (&cLocal) pcSrc; // DestructorConstructor: data copied, cSrc
deleted, cLocal constructed;

C* pcTarget = new C~(); // No constructor called, just space reserverd
cLocal.~C(pcTarget); // DestructorConstructor: data copied,
deconstructed, pcTarget constructed
delete pcTarget;
//------

Obviously, I should now explain myself how did I found this type
modifier is needed.

So, recently I was mostly an "old good" "C" programmer and, due to some
circumstances, need to update my C++ (I left C++ somewhere before
lambdas introduced). During this I noticed that - sometimes - use of C++
adds noticeable cost to application, comparing to hypothetical "C"
implementation.

In "C" use of data stored in some container generally can be: copy some
struct to local variable, do something, copy out.

But in C++, in optimistic scenario, we have (assuming use of move
constructor):
1) Copy data from container to local variables
2) Stuff pointers inside container's variable with zeros (part of move
semantics)
3) Call destructor on object in container (e.g. with "pop_back()" )
Then we can do something with data, then process has to be repeated on
data way out.

So, look on two last points: we first stuff old object pointers with
zeros, then call destructor. Destructor will test that pointers are
still zeros, so will do nothing.
Those both actions nullifies itself, so could be omitted. Those aren't
problem on single objects, but if e.g. moving vector's buffer, we may
have a 1M of zeros to write, then 1M of destructors to call - and all
those to do nothing!

And this can even be worse, if "swap" used, where it can be:
1) Create empty object and stuff its pointers with zeros
2) Copy those zeros to temporary variable (part of swap)
3) Copy data from container to local (this is only operation that is
really needed)
4) Copy zeros from temporary variable to variable in container
5) Call destructor on object in container (destructor does nothing, as
pointers are zeros)
That's really an art of juggling with zeros!

//*** Use of DestructorConstructor with uninitialized types

So having DestructorConstructor and uninitialized types, use of
container could be implemented following hypothetical way:

//------
class C {
     int iData;
     int *pData;
public:
     C() : iData(1) { pData = new int(2); }
     ~C() { delete pData; }
     ~C(C~ *target) { target->iData = iData; target->pData = pData; }

     void DoSomething();
};

C~* BetterVector::AllocateBack()
{
     return &buffer[++last];
}

void BetterVector::DeallocateBack(C~* target)
{
     buffer[last--].~C(target);
}

BetterVector bvSrc, bvTgt;

(...something...)

C~ cLocal; // Uninitiated type - no any
empty constructor call
bvSrc.DeallocateBack(&cLocal); // No any "zeros stuffing" and
spare destructor calls
std::unique_ptr<&cLocal, deleter> upDel;// Optional, only if exception
safety required

cLocal.DoSomething();

upDel.Release();
cLocal.~C(bvTgt.AllocateBack()); // No any "zeros stuffing" and
spare destructor calls
                                         // No spare cLocal destruction
after leaving block
//------

So, above is really "Zero Cost Abstraction", as no spare instructions
generated (comparing to equivalent "C" code).

To answer a question "Do we need this one more way of moving classes",
it can be valuable to understood how - do I think - we got to current,
rather sub optimal standard.

So initially there was OOP (Object Oriented Programming) with paradigm
that all operation on some data set should be done through methods
assigned to this data set, forming structure known known as "class".
Then there came Generic Programming (GP) with idea that all operation on
data should be done through algorithms/containers totally disconnected
from any concrete data set, known as "template".

As anyone can see those two approaches are absolutely contradictory.

So to store OOP classes in GP containers following approaches were applied:

1. First attempt was to use copy constructor/operator inside containers.
That's very good entry level abstraction for classes with mostly old
plain data (POD) or not performance critical in any domain. But for
classes with pointers to huge dynamically allocated memory, handling
times could be tragically bad.

2. More advanced abstraction is move semantics. Move semantics seems to
be reasonable solution for most cases. And at the same time is, in my
opinion, most advanced concept in software engineering. Which I think is
a problem, as being so advanced everybody just wanted it to be a final
solution. But this just fixed problem from being "terribly bad" to
"acceptable cost". It still keeps "juggling with zeros" and spare
destructor calls on quite high level.

3. So as a third layer, here comes proposed "DestructorConstructor". The
lower layer abstraction that, at expense of being more error-prone then
other approaches, should be really zero cost abstraction. To be used
only in efficiency critical and well tested parts of code, like
algorithms/containers that needs to move huge amount of objects.

So that's why I do think that "DestructorConstructor" can be a good
complement to existing object move methods.

//*** DestructorConstructor properties

DestructorConstructor moves object to another one of exactly the same
type, so no any polymorphism can be involved.
This is good illustration of fundamental conflict described above.
Polymorphic classes has to be handled by pointers, so address is sacred
and copy never needed. Copying is needed when using containers, so there
is no place for virtual objects and changes of object's address should
be allowed.

To keep backward compatibility and to prevent unintentional losing side
effects, I think that DestructorConstructor should be deleted by default
and created only explicitly.

//*** Alternative solution

Here should be mentioned, that there exist alternative modification to
the language, that would allow efficient movement of objects inside
containers, like "memcpy" or "mmap'ing": just simply allow changes of
address of object inside container!
Having constructor called on each location, that object wants to take,
is currently rather basic rule of language. But let's look how - I do
think - does this rule come to existence historically.

AFAIK first OOP framework I met was a Turbo Vision, a TUI (Text-based
GUI). All objects there were connected into monstrous tree. App had
pointers to Menu, StatusBar and Desktop, Desktop kept pointers to each
of Windows, Windows to Frames, Views and Controls and so on.
So within such implementation, address of object was obviously sacred -
any change would make object inaccessible, broke whole pointers chain
and most likely crash app.

However, things looks different with containers, that by definition are
collection of items equal for some use. Even if container is sorted,
then some items are just more equal than others, but still no need to
keep individual references. In fact, this is mostly impossible, as most
iterators have a very limited validity lifespan.

So, for containers, strict assigning object identity to memory address
does not make much sense. This rule is a heritage from trees of
polymorphic classes.
So an alternative way is just simply allow programmers to declare trait
"TriviallyMovable" on their "memcpy" compatible classes. Then, for
specific containers, only classes presenting "TriviallyMovable" trait
should be accepted, which should allow "memcpy" operations inside.

//*** Other use - DepositingDestructor

But I think that destructors with parameter, like DestructorConstructor,
are more general solution then just allow copying in containers. For
example, I can imagine such concept of "DepositingDestructor":

//------
// class to receive data from object being deleted
class cDataSink {
public:
     TakeOverData(cSomeData*);
}

// class that can be deleted with real zero-additional-cost transfer of
data:
class C {
     cSomeData *pDdata;

public:
     ~C(cDataSink* sink) { sink->TakeOverData(data); } // <-
DepositingDestructor here
     ~C() { delete pData; } // Normal
destructor
};
//------

Example above is not final and I am not even sure how to solve all
problems (what with inheritance?, what if not all data fields were
deposited somewhere?). But this is just to illustrate where this all can
go further.

//*** DestructorConstructor should replace Move semantics if possible

For example, if using expired (movable) variable to initialize other
(like function parameter), compiler may have a choice to use move
semantics or DestructorConstructor. So DestructorConstructor should be
preferred as having lower cost. It could be also used as optimization in
other cases - provided that compiler may control type of destructor used.

However, this is absolutely not putting move semantics out of business.
In most cases standard destructor has to be called, so there is a need
to stuff old object's pointers with zeros. For example, in following
case move semantics has to be involved:

//------
void foobar(C cSrc)
{
     C cTgt;

     if (something)
         cTgt = std::move(cSrc);

     SomethingSomething(cTgt); // Unimportant use of cTgt

     // Standard cSrc destructor has to be called at the end,
     // as it's not clear if cSrc was moved out or not
}
//------

But in following case, smart compiler could replace move with
DestructorConstructor:

//------
void foobar(C cSrc)
{
     C cTgt;

     cTgt = std::move(cSrc); // DestructorConstructor can be used here

     SomethingSomething(cTgt); // Unimportant use of cTgt

     // If DestructorConstructor was used, compiler do not generate
     // destructor call for a cSrc at all
}
//------

//*** Other uses for uninitialized types

First obvious additional use of uninitialized types is to replace vague
mallocs used to reserve space for tables:
//------
C *array = malloc(10 * sizeof(C)); // Current "magical" method
C *array = new C~[10]; // Safer method, allowing e.g.
type checking
//------

Second proposed use of uninitialized types is to instantiate a template.
AFAIK currently there is being processed a proposal for "*_noinit"
versions of calls that creates smart pointer arrays. So use of
uninitialized type could make an "*_noinit" effect while using normal
version of function - give template an "uninitialized type" so no
constructor will be called.

Third, uninitialized types could also allow exposure of raw "return
object". So functions could build retval in-place gradually, without
need to construct it directly in return statement. This may limit number
of constructors needed to cover all uses of class:
//------
C foobar(int first, char last)
{
     // implicit retval is:
     // C~ retval; // This is most likely a
reference due to a way objects are returned

     new (retval) C; // Constructs C in place for return object
     if (condition)
         retval.DoSomething(first, last); // Currently, separate
constructor needed, that takes those data
     return retval; // Means "do nothing", return
object in its current state
}
//------

//*** End

So, dear reader, if you got here, Thank You for your patience! I am
aware that this proposition has to have many problems inside, and you
probably noticed some while reading my email. And I am not even sure if
I can address it all, as some issues are really beyond me.
But I think this DestructorConstructor approach could open up
interesting possibilities for C++, so I decided to share this, at least
to give people something to think about. Just keep in mind, that main
point here is not to have just one member function more, but to allow
use of memcpy and similar functionalities if appropriate.

Thanks,
Maciej Polanski

Received on 2021-02-10 15:52:14