std-discussion: Re: Accessing an object with a char pointer.

From: Brian Bi <bbi5291_at_[hidden]>
Date: Fri, 15 May 2020 17:40:33 -0400

On Fri, May 15, 2020 at 5:20 PM Hyman Rosen via Std-Discussion <
std-discussion_at_[hidden]> wrote:

> On Fri, May 15, 2020 at 9:30 AM J Decker via Std-Discussion <
> std-discussion_at_[hidden]> wrote:
>
>> On Wed, May 13, 2020 at 7:03 AM yo mizu via Std-Discussion <
>> std-discussion_at_[hidden]> wrote:
>> The issue really stems from big endian platforms... in memory (starting
>> at 100d)
>>
>> 100 - 00 00 01 A4
>>
>> the address of the 'char' that is in that integer is +3 from the start of
>> the pointer.
>> This means that casting a int* to a char* may cause a shift to the
>> pointer value; and worse, the conversion back from char* to int* may not
>> know how many chars to unwind to get back to the start? (or maybe, going
>> to a short* inbetween?)
>>
>> It's not so painful in a little endian world
>>
>> 100 - A4 01 00 00
>>
>> 100 is all the same for char*, short*, int* ,...
>>
>
> The C++ object model is nonsense, and you can't reason from nonsense.
>
> The correct way to think about this, the way it's been done from the
> early days of C, is that an object occupies a region of storage in
> memory, a pointer to the object is the lowest address of that region
> of storage, reinterpret_cast of that pointer to a different pointer
> type is a no-op at runtime, and indirecting through such a pointer
> treats the sizeof(TO_TYPE) lowest bytes of the storage as if it held
> an object of TO_TYPE.
>
> In my vision, memory is a bag of bits, and you can look at that bag
> through the lens of any type and get whatever object of that type that
> is represented by those bits. There is no such thing as "strict aliasing".
> Any write to memory invalidates all cached vales (e.g., in registers)
> unless the compiler can prove that the write wouldn't affect them.
>
> The optimizationist compiler community has committed itself to willfully
> disobeying what programmers write in their code, in patterns that go back
> half a century. Instead of optimization being the transformation of code
> into semantically equivalent forms that improve some metric, it has become
> the task of finding more and more clever ways to break user intentions,
> blithely saying "oh, you couldn't possibly have meant that" and discarding
> swathes of code.
>
>
I think I would take a middle ground here. I think the strict aliasing
rule, which we had all the way back in C++98, is fine. The compiler gets to
make more aggressive optimizations and if you want to reinterpret an `int`
as a `float` or whatever, you just have to go through some function that
will copy the bytes into a `float`, which the compiler can optimize out
anyway.

But what happened in C++17 is that, by the letter of the standard, the
language became effectively unusable in many ways. For example, it seems
that `memset` and `memcpy` are now magic library functions that would be UB
if they were user-defined, and most uses of `offsetof` are not allowed
anymore. When this kind of situation happens, the result is that the C++
community ignores the standard. Some, like me, assume that the standard
will eventually be fixed to resolve this issue. Others may feel that the
standard has lost its legitimacy.

The change that happened in C++17 did not even happen for a good reason. It
was caused by P0137 <https://wg21.link/P0137>, whose real objective was not
to break the C++ object model this badly; it just did so accidentally. WG21
can, and should, fix this problem, and ideally as soon as possible.

Unfortunately, I have encountered resistance to this in the past. One
person's point of view was that despite the brokenness of the current
situation, allowing pointer arithmetic within objects as if they were `char`
arrays would be even worse. Obviously, I strongly disagree with that view.

-- 
*Brian Bi*

Received on 2020-05-15 16:43:48