Date: Fri, 26 Feb 2021 12:41:13 -0500
On 26/02/2021 10.08, Edward Catmur wrote:
> On Fri, Feb 26, 2021 at 1:13 PM Matthew Woehlke wrote:
>> So... in other words, yes, you're taking the "deal with one byte at a
>> time" approach. *Maybe* that *interface* is a candidate for
>> standardization, *iff* it is able to operate on values in-place. The
>> reason I say that is because it will (ahem: "conditionally", for all the
>> pedants that replied) byte-swap under the hood, as any other
>> implementation is almost certainly going to perform much more poorly.
>
> By in-place, you mean *modifying* an integral *lvalue*?
Maybe? I want to say "yes", but I don't think you mean the question the
way I'm reading it.
I mean like my previous example:
fread(fin, &h, sizeof(foo_header), 1);
h.width = byteswap(h.width);
> That sounds highly bug-prone; what if you lose track of whether a
> particular variable has been converted from format to host endian?
Possible, though I think you overstate the danger. Normally with this
pattern, swapping and I/O are immediately adjacent, and there is only
ever a brief window where the bytes are in memory in disk/wire
endianness rather than host endianness.
> Also, a pure function has better performance, not worse, because it leaves
> its result in a prvalue.
Hmm, on second thought, I guess it should be safe to use `endian_load`
"in place", since the write won't happen until the function returns, and
the value will live in a register until then, despite ultimately being
written to the same memory that's being read.
For some reason, I was thinking about the write and the read happening
concurrently, but on further reflection, that seems to have been a bit
of temporary insanity on my part.
>> Also, if you aren't using
>> actual byte-swap intrinsics under the hood, you are most likely leaving
>> performance on the table. (Granted, your approach is more portable, but
>> a vendor implementation would be expected to use intrinsics.)
>
> Also, what compiler is incapable of recognizing a hand-rolled swap
> and optimizing it to the bswap instruction?
Clang: https://godbolt.org/z/56qqaM
Maybe GCC also; I didn't check. I also didn't attempt to decipher the
assembly to see what's going on, but I'm getting *different* assembly
between the "hand-rolled swap" and using __builtin_bswap, which would
seem to suggest clang doesn't see them as "the same".
For reference, here's the hand-rolled version:
for (auto i = 0; i < length; i += 2)
{
auto tmp = data[i];
data[i] = data[i + 1];
data[i + 1] = tmp;
}
>> Overloads taking `span` might be nice, also.
>
> How can you convert a span of bytes without knowing the layout of the
> fields within it? Say you have a range of 10 bytes that is {int64, uint32,
> 4x int8, int16}?
Same as the non-span overloads; there aren't overloads for complex
types, only for `[u]intN_t`. So, you wouldn't use it on
complex-structured data, but it would be useful if your data is large
block [u]intN_t (which is not uncommon with images or audio, especially
uncompressed formats).
> On Fri, Feb 26, 2021 at 1:13 PM Matthew Woehlke wrote:
>> So... in other words, yes, you're taking the "deal with one byte at a
>> time" approach. *Maybe* that *interface* is a candidate for
>> standardization, *iff* it is able to operate on values in-place. The
>> reason I say that is because it will (ahem: "conditionally", for all the
>> pedants that replied) byte-swap under the hood, as any other
>> implementation is almost certainly going to perform much more poorly.
>
> By in-place, you mean *modifying* an integral *lvalue*?
Maybe? I want to say "yes", but I don't think you mean the question the
way I'm reading it.
I mean like my previous example:
fread(fin, &h, sizeof(foo_header), 1);
h.width = byteswap(h.width);
> That sounds highly bug-prone; what if you lose track of whether a
> particular variable has been converted from format to host endian?
Possible, though I think you overstate the danger. Normally with this
pattern, swapping and I/O are immediately adjacent, and there is only
ever a brief window where the bytes are in memory in disk/wire
endianness rather than host endianness.
> Also, a pure function has better performance, not worse, because it leaves
> its result in a prvalue.
Hmm, on second thought, I guess it should be safe to use `endian_load`
"in place", since the write won't happen until the function returns, and
the value will live in a register until then, despite ultimately being
written to the same memory that's being read.
For some reason, I was thinking about the write and the read happening
concurrently, but on further reflection, that seems to have been a bit
of temporary insanity on my part.
>> Also, if you aren't using
>> actual byte-swap intrinsics under the hood, you are most likely leaving
>> performance on the table. (Granted, your approach is more portable, but
>> a vendor implementation would be expected to use intrinsics.)
>
> Also, what compiler is incapable of recognizing a hand-rolled swap
> and optimizing it to the bswap instruction?
Clang: https://godbolt.org/z/56qqaM
Maybe GCC also; I didn't check. I also didn't attempt to decipher the
assembly to see what's going on, but I'm getting *different* assembly
between the "hand-rolled swap" and using __builtin_bswap, which would
seem to suggest clang doesn't see them as "the same".
For reference, here's the hand-rolled version:
for (auto i = 0; i < length; i += 2)
{
auto tmp = data[i];
data[i] = data[i + 1];
data[i + 1] = tmp;
}
>> Overloads taking `span` might be nice, also.
>
> How can you convert a span of bytes without knowing the layout of the
> fields within it? Say you have a range of 10 bytes that is {int64, uint32,
> 4x int8, int16}?
Same as the non-span overloads; there aren't overloads for complex
types, only for `[u]intN_t`. So, you wouldn't use it on
complex-structured data, but it would be useful if your data is large
block [u]intN_t (which is not uncommon with images or audio, especially
uncompressed formats).
-- Matthew
Received on 2021-02-26 11:41:16