C++ Logo

sg12

Advanced search

Re: [isocpp-ext] Total order for pointer comparisons (again)

From: Alisdair Meredith <alisdairm_at_[hidden]>
Date: Tue, 28 Nov 2023 17:07:44 +0700
There is also the issue that `offsetof` returns an interesting value, but cannot
be used to access the data member at that offset by any valid pointer arithmetic
from the address of the complete object itself.

I believe that is a separate issue worth solving, but rightly the topic of another
paper.

AlisdairM

> On 28 Nov 2023, at 16:02, Timur Doumler via Ext <ext_at_[hidden]> wrote:
>
> Hi Alisdair and everyone,
>
>> On 24 Nov 2023, at 23:35, Alisdair Meredith via Ext <ext_at_[hidden]> wrote:
>>
>> I have been looking into writing a paper that suggests making a “flat address space” conditionally supported. In a flat address space, there is a total order for addresses, and pointers would compare only their addresses, without regard to provenance in the abstract machine. This would satisfy naive user expectations, removing another “embarrassment” from the language, while allowing for a compiler switch to re-enable the pointer provenance model and the (often surprising) optimization opportunities it grants.
>
> I hope this is not too much of a digression from the current discussion, but I just wanted to point out that the idea of a "flat address space" might also help us solve a different annoying problem. In C++ today, against all user expectations, you cannot access the underlying bytes of an object representation because doing so is UB:
>
> void print_hex(int n) {
> unsigned char* a = (unsigned char*)(&n);
> for (int i = 0; i < sizeof(int); ++i)
> printf("%02x ", a[i]);
> }
>
> int main() {
> print_hex(123456);
> }
>
> In C, this is fine. In C++, this is widely assumed to be fine as well, but it's not, because 1) even though char* can alias int*, because of the way pointers are defined since C++17 (they're pointing to an object, not merely holding a memory address) the pointer `a` still points to the int object and not its first byte, and the value of that object is not representable by char, which is UB and further 2) you cannot do any pointer arithmetic on that pointer because there is no array there, so that is UB also.
>
> I tried to tackle this problem in P1839, by somehow introducing a notional char array underlying each object and containing its object representation. But after several rounds of Core review where everyone was telling me that this approach doesn't work (see section 8: Known issues in the paper), I have no idea how to possibly make it work with our current memory model so I abandoned the paper.
>
> Now, if we had a "flat address space", it seems to me that this could be a possible solution to the problem, because (if I understand this approach correctly) we could just say that there is a giant char* array underlying all the memory and so we could access object representations by essentially looping over the relevant subrange of this array that corresponds to the object in question, with no UB.
>
> Apologies if I misunderstood and this is not how this would work – I would be very grateful for any clarification.
>
> Cheers,
> Timur
> _______________________________________________
> Ext mailing list
> Ext_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/ext
> Link to this post: http://lists.isocpp.org/ext/2023/11/22265.php

Received on 2023-11-28 10:08:01