ISOCPP std-proposals List: [std-proposals] Proposal for skipping initialization in vector and complex constructors

From: André Offringa <offringa_at_[hidden]>
Date: Sun, 10 Nov 2024 11:04:34 +0100

Dear all,

I'd like to float the idea to make it possible to construct a
std::complex and a std::vector uninitialized. Basically, I would propose
something like the following to exist.

A tag structure:

struct skip_initialization_t {};
constexpr skip_initialization_t skip_initialization;

And the constructor overloads for std::vector and std::complex that have
this tag as parameter, that do not initialize their data values.

For vector, the overload:

// Construct a vector with n elements without initializing them
vector(size_t n, skip_initialization_t, const allocator_type& allocator
= Alloc())

...and optionally "for consistency" but less often used:

// Set the size of the vector to n elements; new elements are left
uninitialized
void resize(size_t n, skip_initialization_t)

// Add n uninitialized values at the given position
// (maybe too obscure...)
iterator insert(const_iterator position, size_t n, skip_initialization_t)

For complex, the overload:

// Leave real and imaginary values uninitialized
constexpr complex(skip_initialization_t)

Some motivation:

My background is that I am involved in the development of scientific
numerical software for radio telescopes, which deal with petabytes of
data, often complex valued ones. We regularly come across situations
where we need to allocate some data and fill it only later. If we use
std::vector for this, say for double values, the data is first
zero-initialized. While on the full scale of things this is "a few
percent" level cost, given our data sizes and the amount of compute we
do, this is not insignificant. We benchmark a lot so we have a
reasonable idea of its cost. While it's maybe a bit niche, there are
multiple stack-overflow-like questions of people that ask how to do this
-- and I'm sure that if it would be available it would see more use.

A very common use-case where initialization can be skipped to save
performance (which is not limited to our domain) is this:

std::ifstream file("myfile.bin");
std::vector<char> buffer(buffer_size);
file.read(&buffer[0], buffer_size);

My proposal is thus that the second line can become:

std::vector<char> buffer(buffer_size, std::skip_initialization);

Currently, to get around the issue, we wrote our own vector (UVector)
class that skips initialization. It has the exact same interface as
std::vector, except unless values are explicitly specified they leave
the data array uninitialized. We still prefer to use std::vector, and
only use UVector when we explicitly need that functionality, for one
because using UVector is obviously less safe, and second we sometimes
need compatibility with std::vector for external libraries. This is thus
rather awkward, and a std::vector constructor that would skip
initialization would be a small fix that solves the issue. Moreover, I
think using a tagged constructor makes it explicit that the user wanted
this unsafe behaviour, at the place it is constructed, instead of at the
place where the vector is declared (as is now the case for us with UVector).

For std::complex, it's even more fundamental as there's basically no
good work-around to skip initialization. We've used approaches where we
first allocated an uninitialised double array and casted it to a
std::complex array, but this causes undefined behaviour because of type
punning rules (complex is special in that casting a complex<double> to a
double[2] is allowed, but the other way around is not allowed). It works
in most situations, but we've also seen compilers (gcc in this case)
make (correct) optimizations that cause this to break, a situation like:

double a[2];
std::complex<double>* b = reinterpret_cast<std::complex<double>*>(a);
*b = <some value>

sometimes causes b to stay uninitialized even after the assignment,
because the compiler apparently deduces undefined behaviour (pretty
fancy behaviour of the compiler! -- but makes it impossible to get the
intended behaviour). As a result there are cases where we don't use
complex values at all, and keep everything as double, resulting in much
more verbose code if we need to do operations on it (+, -, *, /, abs,
norm, exp, etc.). I've also written my own complex class at some point,
but using two complex classes is quite messy.

The two proposed constructors would make our live a lot easier. There
are probably other std data types that, theoretically, could use
skip_initialization construction, but the two I listed here are I think
by far the most important once -- I don't think I've seen any need for
more. I'm curious to what people think.

Kind regards,
André Offringa

Received on 2024-11-10 10:04:37