C++ Logo

std-proposals

Advanced search

Re: Allowing access to object representations

From: Timur Doumler <cpp_at_[hidden]>
Date: Wed, 21 Aug 2019 10:34:06 +0200
Wow, I didn’t know this.

So instead of doing this (= your program, except I fixed x to &x)
int x = 12345;
auto p = reinterpret_cast<unsigned char*>(&x);
for (int i = 0; i < sizeof(x); i++) {
    std::cout << p[i] << '\n';
}

I need to do this to avoid UB?

int x = 12345;
unsigned char buf[sizeof(x)];
std::memcpy(&buf, &x, sizeof(x));
for (auto c : buf)
    std::cout << c << '\n';

Of course, both programs compile to the exact same code with -O3.

This is interesting and I was not aware of this. If a char array can provide storage for an object, and a pointer to any element in the array can alias any other type, it is quite surprising that the first program is UB. Is it just because, as you saty, you’re not allowed to do pointer arithmetic on a pointer obtained that way, or is there another reason?

Cheers,
Timur

> On 21 Aug 2019, at 10:22, sdkrystian via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> Using reinterpret_cast, you can access the first element, but thats about it (pointer arithmetic is UB)
>
>
>
> Sent from my Samsung Galaxy smartphone.
>
> -------- Original message --------
> From: Timur Doumler via Std-Proposals <std-proposals_at_[hidden]>
> Date: 8/21/19 02:23 (GMT-05:00)
> To: Brian Bi <bbi5291_at_[hidden]>
> Cc: Timur Doumler <cpp_at_[hidden]>, std-proposals_at_[hidden]
> Subject: Re: [std-proposals] Allowing access to object representations
>
> That's interesting, thanks for the explanation!
>
> So how would I, in C++17, print the bytes that make up the object representation of int x, without causing UB?
>
> It ought to be possible without memcpying, because char* can alias any other type, including int, but now I am not sure anymore how to correctly write the code that does what your snippet does.
>
> On 21 Aug 2019, at 01:38, Brian Bi <bbi5291_at_[hidden] <mailto:bbi5291_at_[hidden]>> wrote:
>
>> On Tue, Aug 20, 2019 at 8:24 AM Timur Doumler <cpp_at_[hidden] <mailto:cpp_at_[hidden]>> wrote:
>> Hi Brian,
>>
>> > On 19 Aug 2019, at 23:23, Brian Bi via Std-Proposals <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]>> wrote:
>> > Exactly - that means it's still undefined. As I said in one of my earlier messages, it is undesirable to change the standard in a way that breaks lots of code which people will then not rewrite, as this erodes the legitimacy of the standard (and leaves users uncertain about what might get trampled by their compiler optimizers the next time they update.) Yet that's exactly what happened in C++17, and we should fix that.
>>
>> Could you please explain exactly what change in C++17 you are referring to here?
>>
>> The change to the object/memory model made by P0137. At the very least, this breaks any code similar to the following:
>>
>> int x = 12345;
>> auto p = reinterpret_cast<unsigned char*>(x);
>> for (int i = 0; i < sizeof(x); i++) {
>> std::cout << p[i] << '\n';
>> }
>>
>> In Core Issue 1314, CWG said that "the current wording":
>>
>> The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
>>
>> made it "sufficiently clear" that such code was well-defined. (August, 2011) The reinterpret_cast produces a pointer to the first byte of the object representation of x. The issue of whether the pointer arithmetic is valid was raised again in Core Issue 1701 - pointer arithmetic requires an array, and a "sequence" is not unambiguously an array. Core Issue 1701 is still unresolved. However, I feel confident saying that in August 2011, the code above was "supposed" to be well-defined, though a small number of people disagree. CWG simply had not realized at that point that the wording was defective. In 2013, when issue 1701 was raised, they realized the wording was defective.
>>
>> But thanks to P0137, the above code is no longer well-defined even if you want to stretch the reading of the wording, because the result of the reinterpret_cast no longer points to the first byte of the object representation; it just points to the original int object. The pointer arithmetic cannot possibly do the right thing, and according to the plain wording, neither can the lvalue-to-rvalue conversions.
>>
>>
>> Thanks,
>> Timur
>>
>>
>> --
>> Brian Bi
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals


Received on 2019-08-21 03:36:13