std-proposals: Re: Balanced division of iterator range without LegacyRandomAccessIterator trait

From: Andreas Ringlstetter <andreas.ringlstetter_at_[hidden]>
Date: Mon, 7 Sep 2020 21:51:54 +0200

On 07.09.2020 19:04, Jason McKesson wrote:
> But how exactly would that work at the level of iterators?
>
> The reason `set` has a `lower_bound` is because `set` can use unique
> aspects of its implementation to efficiently compute that, compared to
> trying to do a `lower_bound` on, say, a `std::forward_list`.
>
> But the unique aspect of its implementation is not specifically the
> nature of its iterators. Part of this is the fact that it knows that
> the entire range of its iterators is a balanced, binary tree. That is,
> a `set` knows that it is the entirety of a single, balanced binary
> tree. A `set` can therefore subidive itself along certain lines
> semi-efficiently.
>
> But a general pair of `set::iterator`s *cannot*, because they could be
> any contiguous subset of a binary tree. And you can't really subdivide
> that without doing a full tree-walk, which as I understand it is the
> thing you're trying to avoid.

Actually you can do it without a tree-walk.

The next ideal division point is the lowest common ancestor of the range
boundaries, and that can be found in O(min(log(n), m)) time for any two
arbitrary iterators from a tree with `n` elements total and `m` elements
in the input range. For an already ideally divided range (if the yielded
sub-ranges carry additional flags and their bounds are immutable, hence
the optional type cast for begin/end in the original post rather than
returning only the middle point), further sub-divisions are even just O(1).

Leaves you with a total of O(min(log(n), m)) for a non-member
`lower_bound`, the only difference is in a constant factor. The constant
factor is still a point in the case for the member function.

Trees are already pretty much the worst case scenario in terms of
algorithmic complexity here. Other structures are much easier to divide
efficiently.

> So such a subdivision scheme is not based on properties of the
> iterators, but on properties of the range that created them. Which
> means that it can only work on algorithms that take ranges.
>
> And how would such things operate on range adaptors and views, which
> tend to be defined with regard to iterators rather than the ranges
> that provide those iterators? Wouldn't this require certain
> adaptors/views to have to forward properties of the full range that it
> is adapting? After all, I should be able to use a transform range of a
> `std::set` and use it with all of the efficiency of a regular
> `std::set` range. That sounds like a pretty difficult thing to
> implement, on top of all of the other difficult things that a range
> needs to implement.

Some views are going to disable optimizations inevitably. And there is
no way around that. I hope nobody expected Ranges to be a silver bullet,
if 50 years of research in relational databases is any indicator for how
easy it is to shoot yourself in the foot performance wise with nested views.

Any view which messes with the sequencing (anything beyond trimming
start and end of a range by a fixed offset), is inevitably destroying
the properties required for an efficient access to the backing data
structure. Can be re-mapped for *some* view types, but not
unconditionally. Views which are attempting to transform / filter by
index properties are beyond recovery.

Views which are preserving the sequencing, and only operate in the value
domain, are, at least by interface, suitable. An algorithm consuming the
view, assuming that `unseq` traversal is permitted at the calling site,
can utilize the properties of the backing container.

I'm not sure how well the Ranges library can reflect that in the current
state, but it's not a show-stopper conceptually, so much is certain.

The prerequisite would be the ability to translate an arbitrary iterator
yielded on the bottom most level into the corresponding top level
iterator, as well as to unwrap a top level iterator for the purpose of
querying the container. If that isn't supported, then only the
(existing) fallback on linear walks remains as an option for that view.

Received on 2020-09-07 14:55:26