On Fri, Feb 26, 2021 at 5:41 PM Matthew Woehlke <mwoehlke.floss@gmail.com> wrote:
> By in-place, you mean *modifying* an integral *lvalue*?

Maybe? I want to say "yes", but I don't think you mean the question the
way I'm reading it.

I mean like my previous example:

   fread(fin, &h, sizeof(foo_header), 1);
   h.width = byteswap(h.width);

Ah, OK. I'd call that a pure-function approach; in-place would be `void byteswap(Integral auto&)` (and so forth).

> That sounds highly bug-prone; what if you lose track of whether a
> particular variable has been converted from format to host endian?
Possible, though I think you overstate the danger. Normally with this
pattern, swapping and I/O are immediately adjacent, and there is only
ever a brief window where the bytes are in memory in disk/wire
endianness rather than host endianness.

I've seen those bugs happen. It's similar to units bugs (is that altitude in feet or meters?) and similarly the only real fix is to move endianness information into the type system.
 
> Also, what compiler is incapable of recognizing a hand-rolled swap
> and optimizing it to the bswap instruction?
Clang: https://godbolt.org/z/56qqaM

Those versions aren't comparable; iterating over char 2-by-2 is not the same to the optimizer as iterating over short. See https://godbolt.org/z/GvcG9M where the intrinsic and hand-rolled swap are treated identically.
 
Same as the non-span overloads; there aren't overloads for complex
types, only for `[u]intN_t`. So, you wouldn't use it on
complex-structured data, but it would be useful if your data is large
block [u]intN_t (which is not uncommon with images or audio, especially
uncompressed formats).

Isn't this a one-liner?
std::for_each(std::execution::par_unseq, std::begin(data), std::end(data), [](auto& el) { el = bswap(el); }); 
I can't see a vendor writing anything more optimized.