ISOCPP std-proposals List: Re: [std-proposals] Loosen complexity guarantees for std::lower

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Mon, 22 Jan 2024 11:48:28 -0500

On Mon, Jan 22, 2024 at 11:03 AM Phil Endecott via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> IÃºri Chaer wrote:
>
> >> But in the case of std::list since C++11 we have constant-time
> >> size(), which std::lower_bound() doesn't exploit.
> >
> > Well, std::lower_bound() takes a pair of iterators, so we can't really
> rely
> > on size().
>
> Here's an idea, how about a std::lower_bound overload that takes
> an iterator and a size?
>

That is,
  std::lower_bound_n(first, n, value)
  std::lower_bound_n(first, n, value, comp)
That's a very interesting idea!
But that's solving the reverse of Iuri's problem: that overload would be
applicable when you *do* know the length of the range, whereas one-sided
binary search is useful when you *don't* know the length of the range.

In fact, one-sided binary search is usable when you have an *infinite range*
— e.g. you could write
    std::ranges::one_sided_partition_point
<https://en.cppreference.com/w/cpp/algorithm/ranges/partition_point>(std::views::iota(0),
some_predicate);
Is that enough of a killer app that it ought to be its own named
algorithm? Certainly it ought to be a data point discussed in the paper.

Ugh, I feel a bit queasy with the idea of doing a binary search through a
> std::set without using its (private) left/right/parent pointers to do so.
> Slightly more queasy than finding the size of a std::list without using its
> size() member. Here's an idea, add a std::set::lower_bound overload that
> takes an iterator pair (or a single begin iterator); maybe call it a hint,
> like std::set::insert?
>

Off the top of my head, I think that might not be efficiently
implementable. IIRC, the library's red-black trees don't have *parent*
pointers; they have *next/prev* pointers (for iterator traversal). So a
tree operation like "least common ancestor" isn't efficiently
implementable, which I would expect means that `set.lower_bound(first,
last, value)` isn't efficiently implementable either. But it would be
interesting to be proved wrong!

Adding a new algorithm(s) should definitely be discussed in the paper. But
I think it drifts from Iúri's original point, which is that
`std::lower_bound` is unaccountably slow over linked lists when it really
doesn't have to be. Library implementors could give it a 2x runtime
speedup, with no API break, just by switching to one-sided binary search.
We've done similar things in the past; e.g. switching `std::pop_heap` from
the naïve algorithm to Floyd's algorithm:
https://reviews.llvm.org/D118003
We were able to do that only because the Complexity clause
<https://eel.is/c++draft/pop.heap#5> got out of our way and let us do our
job. If it had been specified as "O(log N) calls to `swap`", we wouldn't
have been able to get that speedup.

Iúri has submitted a patch to libc++ which allegedly speeds up
`std::set_intersection` by using one-sided binary search:
https://github.com/llvm/llvm-project/pull/75230
(Although I personally am skeptical of this patch. It seems to me that the
old code did `n` increments and `n` comparisons, and the new code does
between-`n`-and-`2n` increments and between-`log n`-and-`n` comparisons.
That's not a good tradeoff unless comparison is vastly more expensive than
increment; but by assumption, increment is pointer-chasing and thus
expensive. I think it might be useful only for Splunk's very specific
use-case where one set is gigantic and the other is tiny (100% x 1% = 1%),
and it's certainly harmless when the sets are almost equal (90% x 90% =
81%), but it seems like it might be harmful in the "C++ 101" case where the
sets are random (50% x 50% = 25%).)

Iúri claims — rightly AFAICT — that library implementors could similarly
speed up `std::lower_bound` (which already does binary search, so I'm much
less skeptical of *that* patch) if only it weren't for that pesky
Complexity clause.

We shouldn't let the tail wag the dog: we should value *actual runtime
performance* over adherence to formal Complexity clauses.

Anyway, I agree that Iúri should write a paper, and have offered my help if
desired. :)

my $.02,
–Arthur

Received on 2024-01-22 16:48:42