Date: Sun, 10 Nov 2024 10:54:44 +0000
On Sun, 10 Nov 2024, 10:06 André Offringa via Std-Proposals, <
std-proposals_at_[hidden]> wrote:
> Dear all,
>
> I'd like to float the idea to make it possible to construct a
> std::complex and a std::vector uninitialized. Basically, I would propose
> something like the following to exist.
>
> A tag structure:
>
> struct skip_initialization_t {};
> constexpr skip_initialization_t skip_initialization;
>
> And the constructor overloads for std::vector and std::complex that have
> this tag as parameter, that do not initialize their data values.
>
> For vector, the overload:
>
> // Construct a vector with n elements without initializing them
> vector(size_t n, skip_initialization_t, const allocator_type& allocator
> = Alloc())
>
> ...and optionally "for consistency" but less often used:
>
> // Set the size of the vector to n elements; new elements are left
> uninitialized
> void resize(size_t n, skip_initialization_t)
>
> // Add n uninitialized values at the given position
> // (maybe too obscure...)
> iterator insert(const_iterator position, size_t n, skip_initialization_t)
>
> For complex, the overload:
>
> // Leave real and imaginary values uninitialized
> constexpr complex(skip_initialization_t)
>
> Some motivation:
>
> My background is that I am involved in the development of scientific
> numerical software for radio telescopes, which deal with petabytes of
> data, often complex valued ones. We regularly come across situations
> where we need to allocate some data and fill it only later. If we use
> std::vector for this, say for double values, the data is first
> zero-initialized. While on the full scale of things this is "a few
> percent" level cost, given our data sizes and the amount of compute we
> do, this is not insignificant. We benchmark a lot so we have a
> reasonable idea of its cost. While it's maybe a bit niche, there are
> multiple stack-overflow-like questions of people that ask how to do this
> -- and I'm sure that if it would be available it would see more use.
>
> A very common use-case where initialization can be skipped to save
> performance (which is not limited to our domain) is this:
>
> std::ifstream file("myfile.bin");
> std::vector<char> buffer(buffer_size);
> file.read(&buffer[0], buffer_size);
>
This is exactly the use case that string::resize_for_overwrite solves. I
think there is already a proposal to add a similar function to vector.
The committee has seen lots of proposals for a tag that says to leave a
container uninitialised, and has always decided we don't want that because
it's too error-prone and risky.
> My proposal is thus that the second line can become:
>
> std::vector<char> buffer(buffer_size, std::skip_initialization);
>
> Currently, to get around the issue, we wrote our own vector (UVector)
> class that skips initialization. It has the exact same interface as
> std::vector, except unless values are explicitly specified they leave
> the data array uninitialized. We still prefer to use std::vector, and
> only use UVector when we explicitly need that functionality, for one
> because using UVector is obviously less safe, and second we sometimes
> need compatibility with std::vector for external libraries. This is thus
> rather awkward, and a std::vector constructor that would skip
> initialization would be a small fix that solves the issue. Moreover, I
> think using a tagged constructor makes it explicit that the user wanted
> this unsafe behaviour, at the place it is constructed, instead of at the
> place where the vector is declared (as is now the case for us with
> UVector).
>
> For std::complex, it's even more fundamental as there's basically no
> good work-around to skip initialization. We've used approaches where we
> first allocated an uninitialised double array and casted it to a
> std::complex array, but this causes undefined behaviour because of type
> punning rules (complex is special in that casting a complex<double> to a
> double[2] is allowed, but the other way around is not allowed). It works
> in most situations, but we've also seen compilers (gcc in this case)
> make (correct) optimizations that cause this to break, a situation like:
>
> double a[2];
> std::complex<double>* b = reinterpret_cast<std::complex<double>*>(a);
> *b = <some value>
>
> sometimes causes b to stay uninitialized even after the assignment,
> because the compiler apparently deduces undefined behaviour (pretty
> fancy behaviour of the compiler! -- but makes it impossible to get the
> intended behaviour). As a result there are cases where we don't use
> complex values at all, and keep everything as double, resulting in much
> more verbose code if we need to do operations on it (+, -, *, /, abs,
> norm, exp, etc.). I've also written my own complex class at some point,
> but using two complex classes is quite messy.
>
> The two proposed constructors would make our live a lot easier. There
> are probably other std data types that, theoretically, could use
> skip_initialization construction, but the two I listed here are I think
> by far the most important once -- I don't think I've seen any need for
> more. I'm curious to what people think.
>
> Kind regards,
> André Offringa
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
std-proposals_at_[hidden]> wrote:
> Dear all,
>
> I'd like to float the idea to make it possible to construct a
> std::complex and a std::vector uninitialized. Basically, I would propose
> something like the following to exist.
>
> A tag structure:
>
> struct skip_initialization_t {};
> constexpr skip_initialization_t skip_initialization;
>
> And the constructor overloads for std::vector and std::complex that have
> this tag as parameter, that do not initialize their data values.
>
> For vector, the overload:
>
> // Construct a vector with n elements without initializing them
> vector(size_t n, skip_initialization_t, const allocator_type& allocator
> = Alloc())
>
> ...and optionally "for consistency" but less often used:
>
> // Set the size of the vector to n elements; new elements are left
> uninitialized
> void resize(size_t n, skip_initialization_t)
>
> // Add n uninitialized values at the given position
> // (maybe too obscure...)
> iterator insert(const_iterator position, size_t n, skip_initialization_t)
>
> For complex, the overload:
>
> // Leave real and imaginary values uninitialized
> constexpr complex(skip_initialization_t)
>
> Some motivation:
>
> My background is that I am involved in the development of scientific
> numerical software for radio telescopes, which deal with petabytes of
> data, often complex valued ones. We regularly come across situations
> where we need to allocate some data and fill it only later. If we use
> std::vector for this, say for double values, the data is first
> zero-initialized. While on the full scale of things this is "a few
> percent" level cost, given our data sizes and the amount of compute we
> do, this is not insignificant. We benchmark a lot so we have a
> reasonable idea of its cost. While it's maybe a bit niche, there are
> multiple stack-overflow-like questions of people that ask how to do this
> -- and I'm sure that if it would be available it would see more use.
>
> A very common use-case where initialization can be skipped to save
> performance (which is not limited to our domain) is this:
>
> std::ifstream file("myfile.bin");
> std::vector<char> buffer(buffer_size);
> file.read(&buffer[0], buffer_size);
>
This is exactly the use case that string::resize_for_overwrite solves. I
think there is already a proposal to add a similar function to vector.
The committee has seen lots of proposals for a tag that says to leave a
container uninitialised, and has always decided we don't want that because
it's too error-prone and risky.
> My proposal is thus that the second line can become:
>
> std::vector<char> buffer(buffer_size, std::skip_initialization);
>
> Currently, to get around the issue, we wrote our own vector (UVector)
> class that skips initialization. It has the exact same interface as
> std::vector, except unless values are explicitly specified they leave
> the data array uninitialized. We still prefer to use std::vector, and
> only use UVector when we explicitly need that functionality, for one
> because using UVector is obviously less safe, and second we sometimes
> need compatibility with std::vector for external libraries. This is thus
> rather awkward, and a std::vector constructor that would skip
> initialization would be a small fix that solves the issue. Moreover, I
> think using a tagged constructor makes it explicit that the user wanted
> this unsafe behaviour, at the place it is constructed, instead of at the
> place where the vector is declared (as is now the case for us with
> UVector).
>
> For std::complex, it's even more fundamental as there's basically no
> good work-around to skip initialization. We've used approaches where we
> first allocated an uninitialised double array and casted it to a
> std::complex array, but this causes undefined behaviour because of type
> punning rules (complex is special in that casting a complex<double> to a
> double[2] is allowed, but the other way around is not allowed). It works
> in most situations, but we've also seen compilers (gcc in this case)
> make (correct) optimizations that cause this to break, a situation like:
>
> double a[2];
> std::complex<double>* b = reinterpret_cast<std::complex<double>*>(a);
> *b = <some value>
>
> sometimes causes b to stay uninitialized even after the assignment,
> because the compiler apparently deduces undefined behaviour (pretty
> fancy behaviour of the compiler! -- but makes it impossible to get the
> intended behaviour). As a result there are cases where we don't use
> complex values at all, and keep everything as double, resulting in much
> more verbose code if we need to do operations on it (+, -, *, /, abs,
> norm, exp, etc.). I've also written my own complex class at some point,
> but using two complex classes is quite messy.
>
> The two proposed constructors would make our live a lot easier. There
> are probably other std data types that, theoretically, could use
> skip_initialization construction, but the two I listed here are I think
> by far the most important once -- I don't think I've seen any need for
> more. I'm curious to what people think.
>
> Kind regards,
> André Offringa
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2024-11-10 10:56:06