Date: Tue, 27 Aug 2013 10:00:27 +0200
> [Semantics of pointer comparisons]
> Xavier Leroy, the author of CompCert, and I have had long discussions
> about this about a year ago. He is on the list, so he will add
> clarifications or corrections when he gets a chance. Last time (last
> Spring) we talked again about this, he said he has made some adjuments,
> but I don't recall the details.
Yes, I'll chime in. Just a bit of context for the members of this
list: as part of the formal verification of the CompCert compiler, I
and others have been giving formal semantics to a large subset of ISO
C, and (with Tahina Ramananandro and Gabriel Dos Reis) to some aspects
of C++ too.
>From the standpoint of formal semantics, it is easier to account for
undefined behaviors (when everything can happen, incl. crashing) than
for defined-but-unspecified behaviors (when several but not all
outcomes are possible). And of course accounting for
defined-and-uniquely-specified behaviors is easier too.
The example that Gabriel was alluding to is equality tests between a
pointer "one past" the end of an object and a pointer to the beginning
of another object. In C and C++, if I'm not mistaken, this is defined
behavior but can return true or false at the whim of the
implementation. This nondeterminism was difficult to account for in
CompCert's semantics, so in the end we treat it as undefined behavior,
while still defining comparisons involving pointers "one past" as long
as the pointers are within the same object.
Coming back to the proposal of making "<" a total (but unspecified
order on pointers):
- As a semanticist, I think it would be challenging to formalize, for
the general reasons given above.
- As a programmer, I see how this could be very useful to implement
various generic data structures, or to choose a "canonical"
representative between several semantically-equivalent data structures.
- As a compiler writer, I can see a few useful optimizations that this
change would invalidate. Here is an example:
int x[2];
int * p;
...
if (p <= x + 1) {...} else {...}
With the current semantics, "p <= x + 1" is only defined if p points
within x, so it is equivalent to "p != x + 2", which gives useful
aliasing information in the two branches of the conditional.
This is a contrived example, though.
- As an amateur of garbage-collected languages, I see a problem with
relocating garbage collection. (C++ doesn't require a GC, but it is
my understanding that the C++ committee is trying to make C++
compatible with the use of a GC.) A relocating GC can move objects
around, changing their absolute addresses. So, the unspecified
total order on pointers could change during the execution of the
program: before a GC, we have "p < q"; then q is moved, and then
"p > q". This would make the total order unusable to
e.g. order binary search trees...
Regards,
- Xavier Leroy
> Xavier Leroy, the author of CompCert, and I have had long discussions
> about this about a year ago. He is on the list, so he will add
> clarifications or corrections when he gets a chance. Last time (last
> Spring) we talked again about this, he said he has made some adjuments,
> but I don't recall the details.
Yes, I'll chime in. Just a bit of context for the members of this
list: as part of the formal verification of the CompCert compiler, I
and others have been giving formal semantics to a large subset of ISO
C, and (with Tahina Ramananandro and Gabriel Dos Reis) to some aspects
of C++ too.
>From the standpoint of formal semantics, it is easier to account for
undefined behaviors (when everything can happen, incl. crashing) than
for defined-but-unspecified behaviors (when several but not all
outcomes are possible). And of course accounting for
defined-and-uniquely-specified behaviors is easier too.
The example that Gabriel was alluding to is equality tests between a
pointer "one past" the end of an object and a pointer to the beginning
of another object. In C and C++, if I'm not mistaken, this is defined
behavior but can return true or false at the whim of the
implementation. This nondeterminism was difficult to account for in
CompCert's semantics, so in the end we treat it as undefined behavior,
while still defining comparisons involving pointers "one past" as long
as the pointers are within the same object.
Coming back to the proposal of making "<" a total (but unspecified
order on pointers):
- As a semanticist, I think it would be challenging to formalize, for
the general reasons given above.
- As a programmer, I see how this could be very useful to implement
various generic data structures, or to choose a "canonical"
representative between several semantically-equivalent data structures.
- As a compiler writer, I can see a few useful optimizations that this
change would invalidate. Here is an example:
int x[2];
int * p;
...
if (p <= x + 1) {...} else {...}
With the current semantics, "p <= x + 1" is only defined if p points
within x, so it is equivalent to "p != x + 2", which gives useful
aliasing information in the two branches of the conditional.
This is a contrived example, though.
- As an amateur of garbage-collected languages, I see a problem with
relocating garbage collection. (C++ doesn't require a GC, but it is
my understanding that the C++ committee is trying to make C++
compatible with the use of a GC.) A relocating GC can move objects
around, changing their absolute addresses. So, the unspecified
total order on pointers could change during the execution of the
program: before a GC, we have "p < q"; then q is moved, and then
"p > q". This would make the total order unusable to
e.g. order binary search trees...
Regards,
- Xavier Leroy
Received on 2013-08-27 10:13:11