C++ Logo

std-discussion

Advanced search

Re: Pointer comparison with operators vs function objects

From: David Brown <david.brown_at_[hidden]>
Date: Tue, 16 Sep 2025 09:59:14 +0200
On 16/09/2025 03:25, Yongwei Wu wrote:
> I'll reply to this answer to add more details about my question.
>
> On Mon, 15 Sept 2025 at 21:48, David Brown via Std-Discussion
> <std-discussion_at_[hidden]> wrote:
>>
>> On 15/09/2025 14:48, Yongwei Wu via Std-Discussion wrote:
>>> As per my understanding of the C++ standard, operators like < should not
>>> be used to compare unrelated pointers, but function objects like
>>> std::less can. Is there any reason why we cannot make p1 < p2 just
>>> behave like std::less<T*>{}(p1, p2)? I really cannot think any.
>>>
>>> Any insights on this?
>>>
>>
>> First ask yourself, is there any reason why you would want to compare
>> unrelated pointers for ordering? If there are no real-world use-cases
>> that can't be handled well enough using the existing C++ language, then
>> there is no reason to change.
>
> Very simple, it is far too easy to err on the side of UB, as most
> (yes, I believe, most) C++ programmers do not know that one should NOT
> compare arbitrary pointers with operator<.

I believe that most programmers would only want to compare pointers if
the results are going to make sense - so comparing pointers that are not
logically connected in some way is likely to be wrong in the code,
whether it is UB, IB, unspecified, or fully defined behaviour. But I
agree that most programmers don't think too much about the details here,
and also that when people make mistakes in their code, UB is at least
sometimes worse than other unexpected results. (It can, however,
sometimes give better debugging opportunities through sanitisers -
assuming the programmer knows they have an issue and is able to use
tools to find it.)

But as Nate pointed out, in C++, ordered relational comparisons for
unrelated pointers gives unspecified results, not UB. (In C, it is UB,
and I also thought that was the case in C++. So I've learned something
today!)

>
>> Such comparisons can be helpful for implementations of standard
>> functions like memmove(). But those are part of the implementation - a
>> standard library implementation can use compiler-specific features or
>> implementation-specific knowledge to implement its functions
>> efficiently, relying on features that are not part of the C++ standards.
>
> Non-standard-library coders need similar mechanisms, and I certainly
> want it to be portable.
>

Casting to uintptr_t will give you that. The type is optional, but I
would be very surprised to hear of a real platform that does not support
it (for data pointers). Alternatively, you can also use std::less<>.
So the solutions for this exist, and are going to be as efficient as
possible on an optimising compiler. These will both give you
implementation-defined results (intended to be consistent with the
target hardware addresses), rather than unspecified results.

>> There might be occasions when user-written code can have an interest in
>> orderings of pointers, such as for writing specialised memory management
>> routines or perhaps holding unrelated pointers in a searchable data
>> structure. I would expect that for those situations, you can simple
>> cast to uintptr_t and compare those without any overhead. (Use the
>> casts just for the comparisons - avoid casting back and forth between
>> uintptr_t and pointer types because the compiler might lose useful
>> information.) You will probably find that this is what is being the
>> std::less implementation in many C++ libraries.
>
> I think of this problem when trying to detect things like the
> vector::insert self-reference, and I know of some code that needs to
> have similar in-range conditionals.
>

I can see wanting to make arbitrary pointer comparisons - but again,
this can already be done.

>> As for why comparing unrelated pointers (except for equality) is UB,
>> this might be because C and C++ can work on systems with complicated
>> address spaces. Modern x86, ARM, and other common "big" processors have
>> nice, simple flat address spaces. But some other targets have
>> complicated address spaces - the same underlying representation in a
>> pointer might refer to different physical addresses depending on its
>> type or how the pointer is used. (An example of a currently available
>> processor with multiple address spaces and with C++ support is the AVR
>> microcontroller family.) On many such systems, it would be possible to
>> define a comparison relationship - but why bother, if the results don't
>> actually mean anything?
>
> Good to know there are still such systems. I started programming in
> the segmented memory model age, but to me it was long gone now. So I
> did not see the need to keep such wording now.
>

I don't think there are many such systems in modern use, but they do
exist. (And the C++ support for the AVR and other small
microcontrollers can be somewhat variable and not fully standards
compliant.) There are some systems that use "capabilities" or other
kinds of "fat pointers" that hold additional information beyond just the
address - but I don't think these are mainstream at all.

> Another angle: Can we make operator< on pointers
> implementation-defined, as more often than not it has quite
> well-defined behaviour?

If the standard says something is UB, then compilers can freely provide
a documented implementation. The same applies to results that the
standards say are unspecified (as is the case here) - implementations
can give fixed results.

> UB is a dangerous thing, and it is better to
> have as little UB as possible.

I will agree that it is best to /execute/ as little UB as possible -
indeed, you don't want to attempt to run any UB at all. And I agree
that UB is dangerous. But I disagree that a language wants to have as
little as possible. It should avoid surprising or unexpected UB, but
striving to give defined behaviour to all code, regardless of how buggy
or nonsensical it might be, is a fool's errand IMHO. It requires far
too much from a compiled language - imagine trying to find some way to
determine that an arbitrary pointer contains a valid address and points
to a valid object before dereferencing it. You need a heavily managed
language running in an advanced virtual machine to make pointer
dereference fully defined and UB free - that's not the world of C++.

Still, in this particular case it is reasonable to argue that having
comparison of unrelated pointers be UB would be surprising, and I am
happy that it is unspecified behaviour rather than UB.

> (I also hate the fact that sometimes
> compilers utilize UB as a crazy licence to kill, and I do not want
> anything like that for something as simple as p1 < p2.)
>

Optimisations based on UB are controversial, but far from being
universally disliked.

In this particular case, even if "p < q" were UB for unrelated pointers,
I think it is unlikely that it would result in UB-based optimisations.
After all, such optimisations can only be applied when the compiler can
be sure that it really is UB at the time. So if you have, say, some
kind of container class for pointers that orders contents and are
writing an "insert" function, the compiler won't be able to prove that
the two compared pointers are unrelated - it must therefore assume that
they /are/ related and that the comparison is fully defined. No
UB-based optimisations can be done.

The same logic applies to unspecified results for the comparisons.

On the other hand, if the compiler knows without doubt that the pointers
are unrelated, it can do some optimisations even with unspecified
results. It can give different values to "p < q" at different types, or
assume that "p < q" and "p >= q" are both true or both false. So you
still want to use fully specified comparisons in your code.

Received on 2025-09-16 07:59:22