C++ Logo

std-proposals

Advanced search

Re: [std-proposals] is_trivially_copyable_in_reality

From: Frederick Virchanza Gotham <cauldwell.thomas_at_[hidden]>
Date: Sat, 24 Jan 2026 19:13:45 +0000
On Saturday, January 24, 2026, Arthur O'Dwyer wrote:

>
>
> You seem to be reinventing the idea of a "copy constructor."
>


I do see your line of reasoning here but let me get to the optimisation
part.

Let's say we have a 3D graphic class something like:

    struct Sprite : Entity {
        long double GetVolume(void) override;
        long long points[1024][1024];
    };

You'll agree here that the data part of this class is much much bigger than
the potentially "not memcpyable" part of the class (i.e. 8 megabytes Vs 8
bytes).

Now let's say we have a container of 6 million of these sprites, and we
want to copy the whole container. Well we can copy from the old container
to the uninitialised new container as follows:

    for ( unsigned long i = 0; i != 6000000; ++i )
        memcpy( q + i * sizeof(T), p + i * sizeof(T), sizeof(T) );

The above code is essentially calling a copy-constructor in a loop. But
instead of invoking the copy constructor 6 million times, we can do a "bulk
copy" as follows:

        memcpy( q, p, sizeof(T)*6000000 );

Obviously the latter is a lot better (i.e. one memcpy instead of 6 million
memcpy's). But even on arm64e, we can compare the two code snippets. On
arm64e, the first snippet becomes:

    for (unsigned long i = 0; i != 6000000; ++i)
    {
        memcpy( q + i * sizeof(T), p + i * sizeof(T), sizeof(T) );
        copy_lifetime( q + i * sizeof(T), p + i * sizeof(T) );
    }

And the second snippet becomes:

    memcpy( q, p, sizeof(T)*6000000 );
    for ( unsigned long i = 0; i != 6000000; ++i )
        copy_lifetime( q + i * sizeof(T), p + i * sizeof(T) );

Now you might argue that there isn't much benefit to the second snippet as
it must iterate over each element individually -- but the thing is that
when it does so, all it does is set one measly pointer. That's a lot less
CPU trudgery than individually copying each Sprite one by one.

So the optimised copy-constructor for 'vector' would become something like:

template<typename T>
requires is_trivially_copy_constructible<T, true> // note the 'true'
indicates guaranteed complete object
vector(vector const &rhs)
{
    count = rhs.count;
    p = (T*)new char unsigned[ count * sizeof(T) ];
    memcpy( p, rhs.p, count * sizeof(T) );
    for ( unsigned long i = 0; i != count; ++i )
        copy_lifetime( p + i, rhs.p + i );
}

That "for loop" will be optimised away to a no-op on every machine except
for arm64e.

This won't break old code, even though the old code doesn't call
copy_lifetime. This is because the old code will have:

    is_trivially_copy_constructible<T>

instead of:

    is_trivially_copy_constructible<T, true>

and so the old code won't try to use memcpy for polymorphic objects (i.e.
the old code will call the copy-constructor in a loop).

Received on 2026-01-24 19:13:47