C++ Logo

sg12

Advanced search

Re: [ub] Type punning to avoid copying (was: unions and undefined behavior)

From: Jeffrey Yasskin <jyasskin_at_[hidden]>
Date: Thu, 25 Jul 2013 08:04:37 -0700
Sorry, hit send too soon.

On Thu, Jul 25, 2013 at 8:00 AM, Jeffrey Yasskin <jyasskin_at_[hidden]> wrote:
> On Wed, Jul 24, 2013 at 5:05 PM, Nevin Liber <nevin_at_[hidden]> wrote:
>> On 24 July 2013 09:20, Nevin Liber <nevin_at_[hidden]> wrote:
>>>
>>> On 24 July 2013 06:58, Gabriel Dos Reis <gdr_at_[hidden]> wrote:
>>>>
>>>> The "union hack" is treacherous and needs far more investigation
>>>> than it appears.
>>>
>>>
>>> So treacherous that people have been managing to use it for the last 30
>>> years or so?
>>>
>>>>
>>>> Are we going to mandate that the reintepretation of
>>>> the bits cannot possibly yield trap representation?
>>>
>>>
>>> I wouldn't. This is for non-portable machine-specific code. If you blow
>>> a trap representation or an alignment access, you are on your own.
>>>
>>> If I wanted to do this via copying, I'd just use Java.
>>
>>
>> Assuming no alignment, packing, endian or malformed packet issues, here is
>> an example of what people want to do:
>>
>> enum class Protocol : char
>> {
>> UDP = 17,
>> // ...
>> };
>>
>> struct IPHeader
>> {
>> //...
>> Protocol protocol;
>> // ...
>> };
>>
>> struct UDPHeader : IPHeader
>> {
>> //...
>> uint16_t length;
>> //...
>> char data[1];
>> };
>>
>> union Header
>> {
>> IPHeader ip;
>> UDPHeader udp;
>> //...
>> };
>>
>> // Deliver payload to f without copying
>> Payload(dynarray<char> const& packet, std::function<void(char const*, char
>> const*)> const& f)
>> {
>> // Want to overlay Header on packet.data()
>> auto& header = *static_cast<Header const*>(static_cast<void
>> const*>(packet.data()));
>>
>> switch (header.ip.protocol)
>
> The above line, which accesses the stored value of an object whose
> dynamic type is char[] through a glvalue of type Protocol, is
> undefined according to C++14[basic.lval]p10. (The fact that Protocol
> has an underlying type of 'char' doesn't appear to help.
>
> The access through 'header' would be undefined except for "an
> aggregate or union type that includes one of the aforementioned types
> among its elements or non-static data members", since 'char' is a
> member of UDPHeader.
>
>> {
>> case Protocol::UDP:
>> f(&header.udp.data[0], &header.udp.data[header.udp.length - 8]);
>
> And this line is undefined because of accessing a char through a uint16_t.
>
>> break;
>> //...
>> }
>>
>> }
>>
>> Q1: How many places has undefined behavior been invoked in the above?
>> Q2: What is the correct way to write this code?
>
> struct UDPHeader : IPHeader
> {
> //...
> uint16_t length;
> //...
> // char data[1]; // Don't need this anymore.
> };
>
> // Deliver payload to f without copying
> Payload(dynarray<char> const& packet, std::function<void(char const*,
> char const*)> const& f)
> {
> // Want to load Header from packet.data()
> Header header;
> memcpy(&header, packet.data(), sizeof(Header));
>
> switch (header.ip.protocol)
> {
> case Protocol::UDP:
> const char* data = packet.data() + sizeof(header.udp);
> f(data, data + header.udp.length - 8);
> break;
> //...
> }
> }
>
> Now, we'd like the compiler to use 'header' without actually copying
> the data. What would get in the way of that optimization? Well, if the
> compiler can't tell that payload.data() is immutable through all uses
> of 'header', it might think it needs to save the original contents. We
> could enable that optimization again by adding a language feature that
> asserts to the compiler that some data is immutable in a region.


If we don't have that language feature, we can still use packet.data()
without copying it, we just have to use byte offsets explicitly:

template<typename T>
T load(const char* memory) {
  T result;
  memcpy(&result, memory, sizeof(result));
  // Could insert byte-swapping logic here, since endianness is
actually a concern.
  return result;
}

// Deliver payload to f without copying
Payload(dynarray<char> const& packet, std::function<void(char const*,
char const*)> const& f)
{
    // Want to load Header from packet.data()
    Header header;
    memcpy(&header, packet.data(), sizeof(Header));

    switch (load<Protocol>(packet.data(), offsetof(IPHeader, protocol)))
    {
        case Protocol::UDP:
            const char* data = packet.data() + sizeof(header.udp);
            f(data, data + load<uint16_t>(UDPHeader, length) - 8);
            break;
        //...
    }
}

Jeffrey

Received on 2013-07-25 17:04:58