std-discussion: Re: What even happened to <net> and byte swapping?

From: Matthew Woehlke <mwoehlke.floss_at_[hidden]>
Date: Fri, 26 Feb 2021 08:13:12 -0500

On 26/02/2021 06.32, Gennaro Prota wrote:
> On Thu, Feb 25, 2021 at 4:02 PM Matthew Woehlke wrote:
>> On 24/02/2021 11.12, Gennaro Prota wrote:
>>> I have worked on many applications that needed to handle different
>>> endiannesses, but never had to swap bytes around, either in C++ or in
>>> C#.
>>
>> So, let's say you have an image file that is big-endian encoded on a
>> little-endian system. You need to know the dimensions. How do you get
>> those as numbers the CPU can actually use? Do you just read everything a
>> byte at a time?
>
> The point is dealing with the endianness imposed by the external format,
> but not with the endianness of the machine. You can do that by simply
> forming the value from its representation, via bitwise operations. If
> you use something like the templates I linked to:
>
> h.width = endian_load< endianness_of_the_external_format,
> std::uint32_t /*for instance*/ >( address ) ;

So... in other words, yes, you're taking the "deal with one byte at a
time" approach. *Maybe* that *interface* is a candidate for
standardization, *iff* it is able to operate on values in-place. The
reason I say that is because it will (ahem: "conditionally", for all the
pedants that replied) byte-swap under the hood, as any other
implementation is almost certainly going to perform much more poorly.

> Similarly, if the width changes, you can write it back into the array
> with endian_store<>(). And you don't need to swap anything before
> writing the array to disk, either, whatever the machine endianness.

You're quibbling over semantics. Your implementation does still
byte-swap, in that it takes input bytes and (effectively) copies them in
a different order. Your approach also appears to currently have the
limitation of requiring a second copy of the data, which is potentially
inefficient if the data is a mix of endian-dependent and endian-agnostic
data, though I expect that could be relaxed. Also, if you aren't using
actual byte-swap intrinsics under the hood, you are most likely leaving
performance on the table. (Granted, your approach is more portable, but
a vendor implementation would be expected to use intrinsics.)

Note that I mention "byte swapping" because that's what the operation
actually *does* (except when it's a no-op), not because I'm suggesting
an API that *unconditionally* swaps. The original proposal was to
standardize the ntohX / htonX functions, and that is clearly a better
approach, although "best" would be more like hto{le,be} and {le,be}toh
(with no 'X' needed because this is C++ and we have overloads).

Overloads taking `span` might be nice, also.

-- 
Matthew

Received on 2021-02-26 07:13:15