std-discussion: Re: What even happened to <net> and byte swapping?

From: Matthew Woehlke <mwoehlke.floss_at_[hidden]>
Date: Fri, 26 Feb 2021 12:41:13 -0500

On 26/02/2021 10.08, Edward Catmur wrote:
> On Fri, Feb 26, 2021 at 1:13 PM Matthew Woehlke wrote:
>> So... in other words, yes, you're taking the "deal with one byte at a
>> time" approach. *Maybe* that *interface* is a candidate for
>> standardization, *iff* it is able to operate on values in-place. The
>> reason I say that is because it will (ahem: "conditionally", for all the
>> pedants that replied) byte-swap under the hood, as any other
>> implementation is almost certainly going to perform much more poorly.
>
> By in-place, you mean *modifying* an integral *lvalue*?

Maybe? I want to say "yes", but I don't think you mean the question the
way I'm reading it.

I mean like my previous example:

   fread(fin, &h, sizeof(foo_header), 1);
   h.width = byteswap(h.width);

> That sounds highly bug-prone; what if you lose track of whether a
> particular variable has been converted from format to host endian?
Possible, though I think you overstate the danger. Normally with this
pattern, swapping and I/O are immediately adjacent, and there is only
ever a brief window where the bytes are in memory in disk/wire
endianness rather than host endianness.

> Also, a pure function has better performance, not worse, because it leaves
> its result in a prvalue.

Hmm, on second thought, I guess it should be safe to use `endian_load`
"in place", since the write won't happen until the function returns, and
the value will live in a register until then, despite ultimately being
written to the same memory that's being read.

For some reason, I was thinking about the write and the read happening
concurrently, but on further reflection, that seems to have been a bit
of temporary insanity on my part.

>> Also, if you aren't using
>> actual byte-swap intrinsics under the hood, you are most likely leaving
>> performance on the table. (Granted, your approach is more portable, but
>> a vendor implementation would be expected to use intrinsics.)
>
> Also, what compiler is incapable of recognizing a hand-rolled swap
> and optimizing it to the bswap instruction?
Clang: https://godbolt.org/z/56qqaM

Maybe GCC also; I didn't check. I also didn't attempt to decipher the
assembly to see what's going on, but I'm getting *different* assembly
between the "hand-rolled swap" and using __builtin_bswap, which would
seem to suggest clang doesn't see them as "the same".

For reference, here's the hand-rolled version:

     for (auto i = 0; i < length; i += 2)
     {
       auto tmp = data[i];
       data[i] = data[i + 1];
       data[i + 1] = tmp;
     }

>> Overloads taking `span` might be nice, also.
>
> How can you convert a span of bytes without knowing the layout of the
> fields within it? Say you have a range of 10 bytes that is {int64, uint32,
> 4x int8, int16}?

Same as the non-span overloads; there aren't overloads for complex
types, only for `[u]intN_t`. So, you wouldn't use it on
complex-structured data, but it would be useful if your data is large
block [u]intN_t (which is not uncommon with images or audio, especially
uncompressed formats).

-- 
Matthew

Received on 2021-02-26 11:41:16