C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Proposal for skipping initialization in vector and complex constructors

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Sun, 10 Nov 2024 11:02:06 +0000
On Sun, 10 Nov 2024, 10:54 Jonathan Wakely, <cxx_at_[hidden]> wrote:

>
>
> On Sun, 10 Nov 2024, 10:06 André Offringa via Std-Proposals, <
> std-proposals_at_[hidden]> wrote:
>
>> Dear all,
>>
>> I'd like to float the idea to make it possible to construct a
>> std::complex and a std::vector uninitialized. Basically, I would propose
>> something like the following to exist.
>>
>> A tag structure:
>>
>> struct skip_initialization_t {};
>> constexpr skip_initialization_t skip_initialization;
>>
>> And the constructor overloads for std::vector and std::complex that have
>> this tag as parameter, that do not initialize their data values.
>>
>> For vector, the overload:
>>
>> // Construct a vector with n elements without initializing them
>> vector(size_t n, skip_initialization_t, const allocator_type& allocator
>> = Alloc())
>>
>> ...and optionally "for consistency" but less often used:
>>
>> // Set the size of the vector to n elements; new elements are left
>> uninitialized
>> void resize(size_t n, skip_initialization_t)
>>
>> // Add n uninitialized values at the given position
>> // (maybe too obscure...)
>> iterator insert(const_iterator position, size_t n, skip_initialization_t)
>>
>> For complex, the overload:
>>
>> // Leave real and imaginary values uninitialized
>> constexpr complex(skip_initialization_t)
>>
>> Some motivation:
>>
>> My background is that I am involved in the development of scientific
>> numerical software for radio telescopes, which deal with petabytes of
>> data, often complex valued ones. We regularly come across situations
>> where we need to allocate some data and fill it only later. If we use
>> std::vector for this, say for double values, the data is first
>> zero-initialized. While on the full scale of things this is "a few
>> percent" level cost, given our data sizes and the amount of compute we
>> do, this is not insignificant. We benchmark a lot so we have a
>> reasonable idea of its cost. While it's maybe a bit niche, there are
>> multiple stack-overflow-like questions of people that ask how to do this
>> -- and I'm sure that if it would be available it would see more use.
>>
>> A very common use-case where initialization can be skipped to save
>> performance (which is not limited to our domain) is this:
>>
>> std::ifstream file("myfile.bin");
>> std::vector<char> buffer(buffer_size);
>> file.read(&buffer[0], buffer_size);
>>
>
> This is exactly the use case that string::resize_for_overwrite solves. I
> think there is already a proposal to add a similar function to vector.
>
> The committee has seen lots of proposals for a tag that says to leave a
> container uninitialised, and has always decided we don't want that because
> it's too error-prone and risky.
>

Actually I misremembered, there was support for a tag type, but
resize_for_overwrite was preferred for string. See the discussion in
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1072r10.html#alternatives




>
>
>
>
>
>> My proposal is thus that the second line can become:
>>
>> std::vector<char> buffer(buffer_size, std::skip_initialization);
>>
>> Currently, to get around the issue, we wrote our own vector (UVector)
>> class that skips initialization. It has the exact same interface as
>> std::vector, except unless values are explicitly specified they leave
>> the data array uninitialized. We still prefer to use std::vector, and
>> only use UVector when we explicitly need that functionality, for one
>> because using UVector is obviously less safe, and second we sometimes
>> need compatibility with std::vector for external libraries. This is thus
>> rather awkward, and a std::vector constructor that would skip
>> initialization would be a small fix that solves the issue. Moreover, I
>> think using a tagged constructor makes it explicit that the user wanted
>> this unsafe behaviour, at the place it is constructed, instead of at the
>> place where the vector is declared (as is now the case for us with
>> UVector).
>>
>> For std::complex, it's even more fundamental as there's basically no
>> good work-around to skip initialization. We've used approaches where we
>> first allocated an uninitialised double array and casted it to a
>> std::complex array, but this causes undefined behaviour because of type
>> punning rules (complex is special in that casting a complex<double> to a
>> double[2] is allowed, but the other way around is not allowed). It works
>> in most situations, but we've also seen compilers (gcc in this case)
>> make (correct) optimizations that cause this to break, a situation like:
>>
>> double a[2];
>> std::complex<double>* b = reinterpret_cast<std::complex<double>*>(a);
>> *b = <some value>
>>
>> sometimes causes b to stay uninitialized even after the assignment,
>> because the compiler apparently deduces undefined behaviour (pretty
>> fancy behaviour of the compiler! -- but makes it impossible to get the
>> intended behaviour). As a result there are cases where we don't use
>> complex values at all, and keep everything as double, resulting in much
>> more verbose code if we need to do operations on it (+, -, *, /, abs,
>> norm, exp, etc.). I've also written my own complex class at some point,
>> but using two complex classes is quite messy.
>>
>> The two proposed constructors would make our live a lot easier. There
>> are probably other std data types that, theoretically, could use
>> skip_initialization construction, but the two I listed here are I think
>> by far the most important once -- I don't think I've seen any need for
>> more. I'm curious to what people think.
>>
>> Kind regards,
>> André Offringa
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>

Received on 2024-11-10 11:03:29