sg12: Re: [ub] Justification for < not being a total order on pointers?

From: Gabriel Dos Reis <gdr_at_[hidden]>
Date: Mon, 26 Aug 2013 13:12:01 -0500

Chandler Carruth <chandlerc_at_[hidden]> writes:

| On Mon, Aug 26, 2013 at 9:27 AM, Gabriel Dos Reis <gdr_at_[hidden]>
| wrote:
|
| Nevin Liber <nevin_at_[hidden]> writes:
|
| | On 26 August 2013 11:00, Jeffrey Yasskin <jyasskin_at_[hidden]>
| wrote:
| |
| |
| | Could someone explain why we need to allow operator<(T*) to
| be a non-order?
| |
| |
| | It comes from C. I believe it comes from the days of segmented
| architectures.
|
|
| You're right. Historically, it came from segmented architectures.
|
| However, because it has been in the standard for that long, some
| implementations may have taken advantage of that semantics
| restriction
| even if they are not targeting segmented architecture.
|
|
| Also, there are now uses of segmented addressing outside of segmented
| architectures.
|
| The interesting thing is that it remains possible (and unsurprisingly
| possible) to implement a total ordering for all addresses.

In isolation, yes of course, that is possible. The real challenge is
whether we can do that, and get all the other goodies each fraction of
the C++ communities wants. Sometimes things are left "undefined" not
because of impossibility, but only because all fractions can't agree on
something they fall find useful.

For example, some fractions want operator< not to be total order so that
they can implement debug modes. Others want it to total so that they
can use some library constructions.

| It just
| means that this ordering may be marginally more complex than a single
| integer comparison on some architectures. I think this is a fine
| requirement to place on them. In all cases where the segment is known
| to be the same (the overwhelming majority), any reasonable
| implementation will produce the same code as today.
|
|
| So, it is not
| the case that because we don't have wide spread architecture
| anymore
| means that removing that restriction is a breeze. It will
| invalidate
| implementations that use compilation and program transformations
| based
| on that.
|
|
| I actually do not believe there is any such optimization or
| implementation today. I'm not aware of even theoretical optimizations
| that really benefit from this.

Different people tend to associate "optimization" with different things,
so I have been trying to be careful and use "program transformation" so
as not to trigger unwanted associations.

When operator< is not total, a compiler is not required to preserve
the value of "p < q" (when p and q are unrelated pointers) throughout
the entire compilation chain, let alone give a runtime value consistent
with any choice made at compile-time. For example, during compilation,
at a high level representation, the compiler may see that "p" and "q" come
from different objects and decides (within permitted behavior bounds)
that it can simplify that expression to "false". But, later phases of the
compilation (e.g. stack location, activation record layout) may decide
to map the objects in an order that makes the "runtime" evaluation of
that same expression "true", because there is no requirement on relative
position of addresses of unrelated objects. You may not call that an
"optimization" but, it is a valid *program transformation* today.

I know that CompCert

http://compcert.inria.fr/

used to essentially adopt this successive transformations and
storage allocation strategy.

Xavier Leroy, the author of CompCert, and I have had long discussions
about this about a year ago. He is on the list, so he will add
clarifications or corrections when he gets a chance. Last time (last
Spring) we talked again about this, he said he has made some adjuments,
but I don't recall the details.

I am not sure CompCert is unique in this position, though.

| The freedom that optimizing compilers very deeply need is that the
| ordering be totally unspecified.

Is that enough for what people want when they wish for operator< to be a
total order?

| Preserving the relative order of two
| object's addresses is a very high bar (much higher than *providing*
| some relative order) and would impede many real world optimizations.

Indeed.

| So long as we keep this, I have no knowledge of significant negative
| impacts on optimizing compilers. If you have other specific
| optimizations, please bring them up (clearly, I'm not familiar with
| all optimizing compilers! ;])

See above. Note also that I've been careful making a distinction
between "optimization" and "program transformation" -- the former appear
to be loaded (usually with emotions both ways), while the latter
describes a wide range of things compilers and static analysis tools do.

Consider

if (p < q)

if inliner and constant propagator see that "p" and "q" comes from
unrelated objects. The constant propagator can rightfully (today)
simplify the condition to "true" or "false", irrespective of what the
actual runtimes values of "p" and "q" are.

Conversely, a static analyser tool may decide that this is likely a
programmer mistake (since since the formal behaviour is undefined) and
decide to trap that at runtime.

Both are permitted today -- in addition to what CompCert used to do.

| If you make operator< a total order for all valid pointers (and
| including the usual one-past-the-end pointers), I suspect you
| might also
| as a by-product impose a stronger requirement on operator== on
| pointers.
|
|
| This is actually my only real concern, and could well be addressed by
| a careful and thorough paper on the subject.

-- Gaby

Received on 2013-08-26 20:12:18