C++ Logo

sg12

Advanced search

Re: [ub] Justification for < not being a total order on pointers?

From: Xavier Leroy <Xavier.Leroy_at_[hidden]>
Date: Tue, 27 Aug 2013 10:00:27 +0200
> [Semantics of pointer comparisons]
> Xavier Leroy, the author of CompCert, and I have had long discussions
> about this about a year ago. He is on the list, so he will add
> clarifications or corrections when he gets a chance. Last time (last
> Spring) we talked again about this, he said he has made some adjuments,
> but I don't recall the details.

Yes, I'll chime in. Just a bit of context for the members of this
list: as part of the formal verification of the CompCert compiler, I
and others have been giving formal semantics to a large subset of ISO
C, and (with Tahina Ramananandro and Gabriel Dos Reis) to some aspects
of C++ too.

>From the standpoint of formal semantics, it is easier to account for
undefined behaviors (when everything can happen, incl. crashing) than
for defined-but-unspecified behaviors (when several but not all
outcomes are possible). And of course accounting for
defined-and-uniquely-specified behaviors is easier too.

The example that Gabriel was alluding to is equality tests between a
pointer "one past" the end of an object and a pointer to the beginning
of another object. In C and C++, if I'm not mistaken, this is defined
behavior but can return true or false at the whim of the
implementation. This nondeterminism was difficult to account for in
CompCert's semantics, so in the end we treat it as undefined behavior,
while still defining comparisons involving pointers "one past" as long
as the pointers are within the same object.

Coming back to the proposal of making "<" a total (but unspecified
order on pointers):

- As a semanticist, I think it would be challenging to formalize, for
  the general reasons given above.

- As a programmer, I see how this could be very useful to implement
  various generic data structures, or to choose a "canonical"
  representative between several semantically-equivalent data structures.

- As a compiler writer, I can see a few useful optimizations that this
  change would invalidate. Here is an example:

      int x[2];
      int * p;
      ...
      if (p <= x + 1) {...} else {...}

  With the current semantics, "p <= x + 1" is only defined if p points
  within x, so it is equivalent to "p != x + 2", which gives useful
  aliasing information in the two branches of the conditional.
  This is a contrived example, though.

- As an amateur of garbage-collected languages, I see a problem with
  relocating garbage collection. (C++ doesn't require a GC, but it is
  my understanding that the C++ committee is trying to make C++
  compatible with the use of a GC.) A relocating GC can move objects
  around, changing their absolute addresses. So, the unspecified
  total order on pointers could change during the execution of the
  program: before a GC, we have "p < q"; then q is moved, and then
  "p > q". This would make the total order unusable to
  e.g. order binary search trees...

Regards,

- Xavier Leroy

Received on 2013-08-27 10:13:11