C++ Logo

sg12

Advanced search

Re: [ub] Type punning to avoid copying (was: unions and undefined behavior)

From: Jeffrey Yasskin <jyasskin_at_[hidden]>
Date: Thu, 25 Jul 2013 08:00:04 -0700
On Wed, Jul 24, 2013 at 5:05 PM, Nevin Liber <nevin_at_[hidden]> wrote:
> On 24 July 2013 09:20, Nevin Liber <nevin_at_[hidden]> wrote:
>>
>> On 24 July 2013 06:58, Gabriel Dos Reis <gdr_at_[hidden]> wrote:
>>>
>>> The "union hack" is treacherous and needs far more investigation
>>> than it appears.
>>
>>
>> So treacherous that people have been managing to use it for the last 30
>> years or so?
>>
>>>
>>> Are we going to mandate that the reintepretation of
>>> the bits cannot possibly yield trap representation?
>>
>>
>> I wouldn't. This is for non-portable machine-specific code. If you blow
>> a trap representation or an alignment access, you are on your own.
>>
>> If I wanted to do this via copying, I'd just use Java.
>
>
> Assuming no alignment, packing, endian or malformed packet issues, here is
> an example of what people want to do:
>
> enum class Protocol : char
> {
> UDP = 17,
> // ...
> };
>
> struct IPHeader
> {
> //...
> Protocol protocol;
> // ...
> };
>
> struct UDPHeader : IPHeader
> {
> //...
> uint16_t length;
> //...
> char data[1];
> };
>
> union Header
> {
> IPHeader ip;
> UDPHeader udp;
> //...
> };
>
> // Deliver payload to f without copying
> Payload(dynarray<char> const& packet, std::function<void(char const*, char
> const*)> const& f)
> {
> // Want to overlay Header on packet.data()
> auto& header = *static_cast<Header const*>(static_cast<void
> const*>(packet.data()));
>
> switch (header.ip.protocol)

The above line, which accesses the stored value of an object whose
dynamic type is char[] through a glvalue of type Protocol, is
undefined according to C++14[basic.lval]p10. (The fact that Protocol
has an underlying type of 'char' doesn't appear to help.

The access through 'header' would be undefined except for "an
aggregate or union type that includes one of the aforementioned types
among its elements or non-static data members", since 'char' is a
member of UDPHeader.

> {
> case Protocol::UDP:
> f(&header.udp.data[0], &header.udp.data[header.udp.length - 8]);

And this line is undefined because of accessing a char through a uint16_t.

> break;
> //...
> }
>
> }
>
> Q1: How many places has undefined behavior been invoked in the above?
> Q2: What is the correct way to write this code?

struct UDPHeader : IPHeader
{
    //...
    uint16_t length;
    //...
    // char data[1]; // Don't need this anymore.
};

// Deliver payload to f without copying
Payload(dynarray<char> const& packet, std::function<void(char const*,
char const*)> const& f)
{
    // Want to load Header from packet.data()
    Header header;
    memcpy(&header, packet.data(), sizeof(Header));

    switch (header.ip.protocol)
    {
        case Protocol::UDP:
            const char* data = packet.data() + sizeof(header.udp);
            f(data, data + header.udp.length - 8);
            break;
        //...
    }
}

Now, we'd like the compiler to use 'header' without actually copying
the data. What would get in the way of that optimization? Well, if the
compiler can't tell that payload.data() is immutable through all uses
of 'header', it might think it needs to save the original contents. We
could enable that optimization again by adding a language feature that
asserts to the compiler that some data is immutable in a region.

If we don't have that language feature, we can still use packet.data()
without copying it, we just have to use byte offsets explicitly:

template<typename T>
T load(const char* memory) {
  T result;


Jeffrey

Received on 2013-07-25 17:00:26