C++ Logo

sg12

Advanced search

Re: [ub] Type punning to avoid copying

From: Jeffrey Yasskin <jyasskin_at_[hidden]>
Date: Fri, 26 Jul 2013 18:07:13 -0700
On Fri, Jul 26, 2013 at 12:02 PM, Nevin Liber <nevin_at_[hidden]> wrote:
> On 26 July 2013 13:35, Jeffrey Yasskin <jyasskin_at_[hidden]> wrote:
>>
>>
>> Ah, that's too bad because it doesn't serve the purpose Ion was asking
>> about. He wanted a memory buffer with known contents to become the
>> object representation of an object of a particular type.
>
>
> +1.
>
> This is a real issue for anyone that has to interpret binary data from a
> hardware device, a network connection or a file.

FWIW, you do need to deal with endian-ness issues in the real world,
and simply overlaying a struct on raw bytes *won't deal with that*,
even if we define away the aliasing and object-representation issues.
A compelling argument to define some extra behavior should start from
code that would give the right answer on common platforms if it were
naively translated to assembly.

By default, I'd assume that code that needs to parse an external
format should involve a serialization library, not the raw language.

> There needs to be an easy
> to program way of accomplishing this; one shouldn't have to consult the C++
> Committee or be an expert on undefined behavior to guess if the code is
> correct, and one shouldn't have to manually look at the data byte by byte to
> convert it. (Besides the copying cost, one may not need all the fields in a
> message, yet it is incredibly error prone to convert just the fields you do
> need.)

My current answer is "use memcpy on the whole record". Could you post
a program using that answer that gives you extra copies after
optimization with a recent clang or gcc? I think such a program is
possible, but you may need to use multiple TUs.

(We do need to have the standard endorse using memcpy to set the
object representation of a trivially-copyable struct, but I don't hear
any disagreement about wanting that. I'll mail Ville about an EWG
issue.)

> People use the union hack because it is easy to program, given that it is
> mainly declarative. Declare a struct that corresponds to the layout of the
> data, cast the buffer pointer to the struct pointer, and it usually just
> works. If it isn't that easy, people will just keep using the union hack.

That's ... not the union hack. The union hack is the thing that gcc
(http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/Optimize-Options.html#index-fstrict_002daliasing-881)
and probably C99 endorse where you write one field of a union and read
a different field. Simply casting a char[] to a different type is
something else, which I hear more objections to simply allowing.

However, despite nitpicking all of your points, I do agree that we
really want to have a clear answer about how to do this, and it'd be
nice if we could make most of the code people are already writing Just
Work™.

> This is one of the reasons people pick C and C++ in the first place.
>
> One (unexplored) thought: maybe we need some magic classes to encapsulate
> this, just like std::atomic hides details about atomic access.

A sufficiently-powerful reflection interface (note, detailed
discussion of this belongs on a different SG's mailing list ;) could
probably let us define:

struct Record {
  uint16_t a;
  uint32_t b;
};
uint16_t a = from_memory<Record, big_endian>(memory).a();

But there are several steps in the reflection process before getting there.

Jeffrey

Received on 2013-07-27 03:07:35