Date: Mon, 22 Jun 2026 09:24:56 +0000
On Monday, June 22nd, 2026 at 11:01 AM, Corentin Jabot via SG16 <sg16_at_[hidden]> wrote:
>> Anyway, my greater issue with endian views is that they do the endianness conversion in the wrong place. You get a mathematically meaningless uint32_t value when performing a byteswap on, say, a Unicode code point; the only purpose is to dump the bytes of that uint32_t to memory. The byteswap should be taking place in a serialization view that produces a range of bytes and can either encode to little endian or big endian.
>>
>> The little dance we force users to go through for, say, serializing std::float32_t is just not ergonomic: bit-cast the range to uint32_t, use a second view to transform the endianness, use a third view to generate the byte array, and a fourth view to join the bytes into a single range again. This sounds pretty terrible to me.
>>
>> The examples in the paper are contrived, in order to obtain a nicer-looking before/after comparison table.
>>
>>> constexpr vector<uint32_t> utf16be_to_utf32be(
>>> const vector<uint16_t>& utf16be_data)
>>
>> Why would I be holding a vector of big-endian uint16_t data in the first place? There is nothing I can do with this vector except transform its endianness so it becomes useful, or shoot myself in the foot by forgetting about the endianness of the data inside. I should either be holding a byte vector where the date is encoded in big-endian, or a vector of uint16_t or char16_t with native endianness.
>>
>> - If I started with a byte vector, it would be obvious in the comparison table that the paper's proposed feature is doing little to help the user.
>> - If I started with native endian data, I wouldn't need the paper's feature at all.
>
> I agree with that.
Same; as far as I can tell the main problem people have with endian-ness is loading data into structures with the assumption of some kind of endianness, and then retroactively patching the endianness in case it did not match the expected endianness. Having an endian-swapping view over a mis-loaded input array is solving the wrong problem - we should have a "bytes to uint16" view that does the endian correct loading and/or storing. Same for 32-bit etc.
I have my own serialization library written around that exact setup, and it both allows writing simple readers and writers for almost any format, and does not at any point need to check endianness. It does this as a view, similar to what this endian-view is proposing, except it requires the user to indicate what types to load when as it's meant both for existing file formats that can have unexpected padding, weird alignment or ossified fields.
This heterogeneous loading isn't necessary for loading and storing strings - and I would expect this library to be unsuitable for standardization due to the size of its design space. To make strings and unicode more usable in C++ we do need a way to extract homogeneous streams of data from a byte view, and vice versa write a byte stream from a homogeneous input stream, in accordance with UTF16/32 LE/BE.
Would be good to see the paper that proposes those views.
>> Anyway, my greater issue with endian views is that they do the endianness conversion in the wrong place. You get a mathematically meaningless uint32_t value when performing a byteswap on, say, a Unicode code point; the only purpose is to dump the bytes of that uint32_t to memory. The byteswap should be taking place in a serialization view that produces a range of bytes and can either encode to little endian or big endian.
>>
>> The little dance we force users to go through for, say, serializing std::float32_t is just not ergonomic: bit-cast the range to uint32_t, use a second view to transform the endianness, use a third view to generate the byte array, and a fourth view to join the bytes into a single range again. This sounds pretty terrible to me.
>>
>> The examples in the paper are contrived, in order to obtain a nicer-looking before/after comparison table.
>>
>>> constexpr vector<uint32_t> utf16be_to_utf32be(
>>> const vector<uint16_t>& utf16be_data)
>>
>> Why would I be holding a vector of big-endian uint16_t data in the first place? There is nothing I can do with this vector except transform its endianness so it becomes useful, or shoot myself in the foot by forgetting about the endianness of the data inside. I should either be holding a byte vector where the date is encoded in big-endian, or a vector of uint16_t or char16_t with native endianness.
>>
>> - If I started with a byte vector, it would be obvious in the comparison table that the paper's proposed feature is doing little to help the user.
>> - If I started with native endian data, I wouldn't need the paper's feature at all.
>
> I agree with that.
Same; as far as I can tell the main problem people have with endian-ness is loading data into structures with the assumption of some kind of endianness, and then retroactively patching the endianness in case it did not match the expected endianness. Having an endian-swapping view over a mis-loaded input array is solving the wrong problem - we should have a "bytes to uint16" view that does the endian correct loading and/or storing. Same for 32-bit etc.
I have my own serialization library written around that exact setup, and it both allows writing simple readers and writers for almost any format, and does not at any point need to check endianness. It does this as a view, similar to what this endian-view is proposing, except it requires the user to indicate what types to load when as it's meant both for existing file formats that can have unexpected padding, weird alignment or ossified fields.
This heterogeneous loading isn't necessary for loading and storing strings - and I would expect this library to be unsuitable for standardization due to the size of its design space. To make strings and unicode more usable in C++ we do need a way to extract homogeneous streams of data from a byte view, and vice versa write a byte stream from a homogeneous input stream, in accordance with UTF16/32 LE/BE.
Would be good to see the paper that proposes those views.
Received on 2026-06-22 09:25:09
