std-proposals: Re: Mixed comparisons for smart pointers

From: Giuseppe D'Angelo <giuseppe.dangelo_at_[hidden]>
Date: Sun, 24 Jan 2021 19:25:20 +0100

Hi Arthur,

As usual, thank you for your detailed reviews.

On 22/01/2021 15:34, Arthur O'Dwyer wrote:
> On Fri, Jan 22, 2021 at 6:40 AM Giuseppe D'Angelo via Std-Proposals
> <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]>>
> wrote:
>
> I've got a draft for a proposal that allows for comparisons between a
> smart pointer and a raw pointer.
>
> [but deliberately doesn't touch std::hash]
>
> Since your Tony Tables do mention std::hash, I'd like to see your paper
> specifically in one place list the guarantees that we already have about
> that stuff in C++20.
> For example:
>
> - Does `uptr.get() == p` already imply that `std::hash<>()(uptr) ==
> std::hash<>()(p)`?
> - Does `sptr.get() == p` imply that `std::hash<>()(sptr) ==
> std::hash<>()(p)`?

What we have for unique_ptr and shared_ptr is pretty strong, their
hashing is defined to be equal to hashing their pointers:

https://eel.is/c++draft/util.smartptr.hash

I don't want to get anywhere close to solving or discussing
heterogeneous (smart) pointer hashing; and the reason is, the
implications you wrote above are all false if one _hashes_ heterogeneously.

They're even false for raw pointers -- due to pointer adjustments:

> struct A { ~~~ };
> struct B { ~~~ };
> struct C : A, B { ~~~ };
>
> C c;
> B *p1 = &c;
> C *p2 = &c;
>
> assert(p1 == p2); // must be true
> assert(std::hash<B*>()(p1) == std::hash<C*>()(p2)); // can be false

https://gcc.godbolt.org/z/hWea5M

So one has already to be extra careful when doing heterogeneous hashing
of pointers (in all likelihood: just NOT doing it?). I'm afraid you must
be explicit with the hash<> specializations you meant above otherwise.

Of course, the implications above are all true if `p` is the same type
returned by the `get()` calls, by the definition of the hashing of the
smart pointers.

> - Does `uptr.get() == sptr.get()` imply that `std::hash<>()(uptr) ==
> std::hash<>()(sptr)`?

Not in general, due to possible pointer adjustments. Yes, if pointer
types are the same. However, for all "straightforward" usages of
unique_ptr and smart_ptr, I hope the antecedent of the implication is
NEVER true? (Different smart pointers, owning the same object?)

> - Does `wptr.lock() == sptr` imply that `std::hash<>()(wptr) ==
> std::hash<>()(sptr)`?

We can't hash weak_ptr.

[snip]
>
> I recommend that you adopt the same strategy in your Tony Table code,
> both before and after. Delegate to hash<T*> in every case.

There's already a comment: // == (*this)(ptr.get())
I.e. both implementations are completely equivalent from a functional
point of view, just ignoring the extra template instantiations and such.
I'll expand.

>
> As for the comparison operators that you /*are*/ proposing, personally
> I'm skeptical. Your Tony Table has three rows:
> (1) The first row is definitely the most convincing. However, nits: Your
> `isEqual` lambda should capture [&] instead of [input], since it's not
> doing anything tricky that would require deviating from the usual
> practice.

Sure, I was just trying to keep it simple and to the point (without
requiring the reader to ask themselves what was getting captured, what
was passed to the predicates, etc.).

> Also, avoid ADL; see
> https://quuxplusone.github.io/blog/2020/07/08/erase-if/#ah-so-should-i-maybe-treat-erase

Don't want to derail this thread , but I disagree with that blog post's
conclusions. For instance, Qt containers now have free erase(_if)
functions for them in Qt's namespace, complementing the
QContainer::removeAll(T)/removeIf(P) member functions, so ADL erase(_if)
is a real thing now:

> https://doc-snapshots.qt.io/qt6-dev/qlist.html#erase_if

And thus I'd personally favour calling them using ADL calls.

> . I would write something like
> std::erase_if(objects, [&](auto&& p) { return p.get() == input; });
> or else factor it out into a helper function so you don't have to repeat
> the body of the lambda multiple times:
> static auto isEqualTo(const Object *input) {
> return [=](auto&& p) { return p.get() == input; };
> }
> [...]
> std::erase_if(objects, isEqualTo(input));

This was actually deliberate :) in the sense that: *not* having mixed
comparisons may cause accidental code duplications. For instance, you
may write same kind of predicates in multiple places (say, wherever you
need to run some algorithm on a container of smart pointers).

In other words, it's not always obvious that the function (predicate,
etc.) you're looking for has already been written elsewhere and could be
centralized and reused. Just typing in the required one-liner lambda
(and moving on to solving another problem) becomes *very* tempting.
I'll try and be more clear in the table itself...

> (2) The second row, with `std::set<unique_ptr>`, is completely
> unconvincing, because the /point/ of unique_ptr is that /of course/
> you'll never find `objects.contains(ptr)` to be true; that would be
> possible only if `ptr` pointed to an element of the set already. So that
> row should be rewritten and/or expanded to make it clearer what you're
> trying to accomplish.

As I wrote in the paper:

> The case of an associative container using a unique_ptr as its key type is particularly annoying; one cannot practically *ever* look up in such a container using another unique_ptr, as that would imply having two unique_ptr objects owning the same object. Instead, the typical lookup is heterogeneous (by raw pointer); this proposal is one step towards making it more convenient to use, because it enables the usage of the standard std::less or std::equal_to. (Note, however, that this proposal does not address at all the issue of heterogeneous hashing for smart pointers.)

As of why would you have a unique_ptr as a key in an associative
container: why not? The use cases would be similar to the "Manager"
scenario, just using a different data structure. Maybe it needs to e.g.
map each object to some data.

> (3) The third row, with `smart_pointer_hasher`, is less convincing than
> I'd like, because you're not trying to solve the hashing problem at the
> same time. If I had a choice between "unordered_set with custom hasher
> and custom equality predicate," versus "unordered_set with custom hasher
> and /default/ equality predicate," I feel like I'd have a hard time
> deciding which one I preferred. Hashing and equality-comparison usually
> go together as a package, to the point where I sometimes write them in a
> single struct — operator()(const T&) for hashing, operator()(const T&,
> const T&) for equality.

That is absolutely true, and I'm *very marginally* interested in the
hashing scenario. I'm merely providing a building block; heterogeneous
hashing for (smart) pointers can still be added separately, and I don't
see adding mixed comparisons as a blocker for that.

> I am */very mildly/* concerned that you're asking for this to compile.
> But surprisingly-compiling-code is just a natural side-effect of any
> time we weaken the type system:
> std::shared_ptr<char> sptr = std::make_shared<char>('a');
> assert(sptr != "hello world");
>
> Your current proposed wording makes this ambiguous AFAICT, but (A)
> nobody should write this,

Uhm... actually, why that shouldn't compile?
std::equality_comparable_with<char *, const char[123]> is satisfied on
GCC (but I admit I didn't check if this should be the case as per
Standardese, the actual definitions of these concepts are super scary.)

and (B) this can always be fixed by heroic
> fiddliness in LWG, as is tradition:
> // https://godbolt.org/z/69nzMv
> std::shared_ptr<int> sptr = std::make_shared<int>(42);
> assert(sptr != NULL);

Eeerk...

> This proposal violates my personal sense of how value semantics ought to
> work, in the sense that you're asking to permit
> x == y
> even while disallowing both
> x = y;
> y = x;
> I can't think off the top of my head of any existing pairs of types
> where this is true.
> We have plenty of pairs where the assignment (implicit conversion) is
> permitted in one direction but not the other, e.g. string/string_view,
> int/optional<int>, nullptr_t/int*.

(I'll avoid the silly answers, like: array types...)

This is an interesting observation, but is it necessarily a problem?

First and foremost, unique_ptr is a type (directly involved in this
discussion) where the above holds -- copy assignments don't compile, yet
they're comparable for equality. Even turning the assignments into move
assignments doesn't help: you can have unique_ptr with incompatible (non
assignable) deleters, that therefore can't be assigned on each other,
but they still can be compared.

Second, more in general, the discussion here is that smart pointers
conflate two semantics: ownership of an object, and representing the
address of an(other) object. This last bit is shared with the idiomatic
usage of raw pointers.

In the assignments you wrote above, the one _towards_ a smart pointer:

smart_ptr = raw_ptr;

deals with transferring ownership from the raw pointer to the smart
pointer object. Since we don't want that to happen accidentally, we
disallow it -- we don't provide that particular assignment operator in
the first place, and we mark smart pointer constructors as `explicit` to
avoid any implicit conversion that would make the assignment work.

We also disallow implicit conversions from a smart pointer towards a raw
pointer; those would make the assignment from smart pointers to raw
pointers to compile too:

raw_ptr = smart_ptr;

While this wouldn't deal with ownership transfer but just to extract the
pointer, and would be in principle be more acceptable, it's still
forbidden to prevent "funny" accidents (such as `delete smart_ptr`).

However, the comparison operators don't deal with ownership. The
question we ask when comparing two (smart) pointers is if they're owning
the same object, or if they're pointing to the same object? To me it's
the latter -- signalling a very significant departure from the semantics
of assignment (no ownership involved). Things like fancy pointers into
unique_ptr, or the aliasing constructor of shared_ptr, make this a
reality: you can have two shared_ptr that own the very same object, but
point to different objects, and thus compare different.

So why neglecting the possibility that this comparison can happen, in
general, heterogeneously -- even if it's between types that you can't
assign on each other (because the assignment semantics are different
than the comparison semantics)?

> There was a proposal several years ago to make vector<int>/vector<float>
> such a pair, but it wasn't adopted. It would be a good idea for you to
> look up that discussion and link it from your paper. I believe the
> proposer was Marshall Clow.

I guess it was
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0805r2.html
?

> You refer to an LEWG reflector discussion started by Marc Mutz. Could
> you link to that thread in the paper, or at least give the committee
> /*some*/ hint about how to find that previous discussion? (Also,
> linking to this std-proposals thread would be cool too.)

Done and done.

>
> TLDR, I have nitpicks on your Tony Tables, and I personally wouldn't
> favor this p roposal because I think it is a half-solution to a very
> marginal problem; but I don't see any /*fatal*/ flaw in it.
>

I'll try to update the draft in the next few days.

Thank you again,

-- 
Giuseppe D'Angelo | giuseppe.dangelo_at_[hidden] | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - The Qt, C++ and OpenGL Experts

Received on 2021-01-24 12:25:28