C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Proposal for std::(set / map / multiset / multimap)::partition_point

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Thu, 13 Jun 2024 11:44:15 -0400
On Thu, Jun 13, 2024 at 11:13 AM Oleksandr Pikozh via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> I was thinking about exactly the same thing long ago (a-la “they did great
> job when designing the std library, how could it be that this thing is
> missing?”). Certainly I needed that more than once for production code (of
> course, I don't remember the details now). Happily, since c++14 (and in
> boost), we can workaround that via transparent comparators. Still, it would
> be certainly an improvement (closing the gap) if someone writes the paper
> and it gets accepted. My intuition don't even believes this needs an
> explicit motivation (binary tree index for a composite key [kinda RDBMS
> terminology here, but the idea is true everywhere] should give a
> possibility to search not only by the full key but also by the subkeys a-la
> first-few-attributes-of-the-full-key – by itself, i.e. without building
> additional indices [creating extra maps]), though of course my mind
> understands that the motivation is always needed.
>
Hmm... This is actually adjacent to a pain point that libc++ has had with
the associative containers. Consider:

    std::multiset<std::string, std::less<>> ms;
    std::set<std::string, std::less<>> s;
    auto r1 = ms.equal_range("ABC"s); // might return 2 or more elements,
requires two O(lg N) steps
    auto r2 = ms.equal_range("ABC"sv); // might return 2 or more elements,
requires two O(lg N) steps
    auto r3 = s.equal_range("ABC"s); // always returns 0 or 1 elements,
requires one O(lg N) step plus O(1)
    auto r4 = s.equal_range("ABC"sv); // might(!!) return 2 or more
elements, requires two O(lg N) steps

Notice that r4 is measurably slower than r3.
This is because there's no way for the `set` itself to understand that the
domain of `string_view` is intentionally identical to the domain of
`string`. The STL has to assume that you might be abusing the transparent
`equal_range` to compute a range of multiple elements, simply because *the
transparent `equal_range` is the only verb we provide for that operation*.
In 20/20 hindsight, it would have made a lot more sense to provide two
separate verbs for
(1) "Give me the range of elements matching this predicate [expressed as
equivalence to this magic object], *which I guarantee [on pain of UB] will
all be consecutive*." (Call this "the slow operation.")
(2) "Give me the 0 or 1 elements that match this specific *value*, which I
guarantee [on pain of UB] *comes from the same domain* as your own
value_type." (Call this "the fast operation.")

It sounds like this `.partition_point` idea is very close to the idea of
(1). If we could express (1) cleanly with some new verb, then there is a
(tiny) chance that one day we could change the precondition on
`set.equal_range` to make it perform the fast operation (2) instead of the
slow operation (1). Which is what 90% of programmers probably always
thought it did, anyway.
(Even some libc++ contributors, at one point! ;))

I do agree that it would make sense to try to get this operation into some
third-party library (Abseil, Boost, ...) before writing a WG21 proposal.
Heck, they might already have it!
It would also allow you to survey how those non-STL implementations
currently treat `r4` — do they already treat it as the fast operation? If
so, do they currently lack a verb for the slow operation? Would the
addition of your verb actually get them all the way to the idyllic state I
describe?

I tend to believe that "Multiple third-party libraries already do exactly
this" is a great argument in favor of WG21 doing the same thing.
(Admittedly, I believe this *contra* the visible evidence of P1144, which a
dozen third-party libraries implement but WG21 shuns. ;))

–Arthur

Received on 2024-06-13 15:44:29