Every major compiler is currently doing the right thing with this code:

int x = 100000;
char ch = *reinterpret_cast<char*>(&x);

which is: to interpret ch as the *first byte* of x (which will be different depending on endianness), and not to interpret this char as somehow holding the value of the whole int object.

Wording has been cited here which can be interpreted as requiring the latter behaviour, rather than the former. I believe we are simply looking at a wording defect. I can not think of any situation where such behaviour (ch represents “the whole object” rather than the first byte) would be useful.

I believe we should clarify the wording such that it becomes unambiguous that ch points to a *byte* of x. We should also clarify that this is not UB, even if 100000 does not fit into a char, as this “fitting the whole object into a char" is not at all what should be going on. We should also fix the wording to allow to access the other bytes of x, not just the first one, without causing UB (by allowing pointer arithmetic and looking at the other addresses inside the char array that provides storage for x). 

This would give us a portable way of looking at the bytes of an object (which ought to be possible anyway – this is why we have the “char and std::byte” can alias any type” rule in the first place).

I think if we would have such a fix, then together with Richard’s P0593 this would be a comprehensive plug for this gaping hole in the language.

What do y’all think?

Cheers,
Timur

On 22 Aug 2019, at 09:17, Jake Arkinstall via Std-Proposals <std-proposals@lists.isocpp.org> wrote:

On Thu, 22 Aug 2019, 02:06 sdkrystian via Std-Proposals, <std-proposals@lists.isocpp.org> wrote:
> It does. But the char pointer points to the first byte. Which is platform dependant. 

Don't think so: "the pointer value is unchanged" means exactly what it says, it still points to the int object. It is not specified anywhere that it will point to any char object.

> Which is a char.

No, it is never specified that it points to any char object. It points to an int object. 

You have a pointer some int. That int is multiple char widths long. You create a pointer to a char and reinterpret cast the int pointer into a pointer to char. It might truly be a pointer to int, but by this point there is no way for any code accepting that char pointer to know that it is truly a pointer to int, char, or an elephant, but that, for all intents and purposes, it can be treated as a char. Dereferencing the char pointer grants you one char representing the value of the leftmost byte of the integer.

What in the above is incorrect?

Im asking specifically because what you're saying directly contradicts an example provided at the bottom of https://en.cppreference.com/w/cpp/language/reinterpret_cast, where a char pointer to an int is dereferenced to determine the endianness of the system.



--
Std-Proposals mailing list
Std-Proposals@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals