std-proposals: Re: Finding the index of an element (std::find_first

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Fri, 2 Oct 2020 10:09:00 -0400

On Thu, Oct 1, 2020 at 8:20 PM Gokhan via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> But in those situations, the initial range is undoubtedly an array (performance game programming doesn't do linked-lists), so `distance` is 100% OK.
>
>
>
> Yes, but that is not the point. Theoretical examples like that are cherries the standard picks on purpose in an attempt to make the library code as generic as possible without sacrificing performance. The example, regardless of how silly it might be in practice, is valid in that regard, a program can be written to prove it, and hence is a point in favour of the template.

Changes to the standard need to be justified on the basis of making
someone's life as a C++ programmer *materially better*. If all of the
examples of cases where your suggestion are better than the
alternatives are purely theoretical, if the actual real-world cases
where someone needs this functionality can be covered by a trivial
addition to their code, then your idea has very weak motivation.

Thus far, your motivation comes down to two points:

1: Aesthetics. 1 function call is cleaner than 2.

2: Ranges of size beyond that of a difference-type.

#1 is purely an opinion, and #2 is something that happens so rarely in
the real world that it's basically a rounding error.

However, with C++20 canonizing 2's complement signed integers, we do
have the means to solve #2 now. Previously, if a positive difference
from `std::distance` would exceed the different_type, then you get
undefined behavior. We could now say that `std::distance` in these
cases (and all iterator differencing functionality it is based on)
behaves exactly as if you had gotten the difference as a `size_type`,
then cast it to a `difference_type`. This would allow you to recover
values exceeding the signed range by casting it back to `size_type`.
Which C++20 makes valid.

This is more generally useful than a targeted solution specifically
for `std::find` and similar algorithms.

> The same theoretical reasoning applies as an argument against Matthews point about the std::distance range covering the allocation limit in practice. The standard is written against an imaginary machine that *can* allocate the full max(size_t) range.
>
>
>
> std::distance also returns a signed integer type – containers are indexed by size_type (unsigned). Another point in favour of the suggested template.
>
>
>
> But, while all of that holds, it’s besides the more important practical argument that it is simply more concise to have 1 vs 2 function calls. If one of the goals of the standard committee is to convince people to stop writing C in C++, and express their intent clearly, then this is what you have to do: provide the absolute shortest form in the standard that Just Works(TM). The suggested template function is that.

The structure of that argument could be applied to virtually anything.
At some point, you have to look at a thing and ask whether the "1
function call" version is worth the effort of adding to the standard
compared to the "2 function call" version. Building higher level
functionality out of lower level functionality is perfectly normal.
Nobody expects the standard to provide a "do_my_work_for_me" function.

So if you're going this route, then you need to provide a convincing
argument that this code is sufficiently common to be worth adding a
specific function for it. Especially since you're not just talking
about one function but an entire family of them.

Also, I don't understand this: "convince people to stop writing C in
C++". Encouraging people to use indices when they could use iterators
is a big part of the "C in C++" style of coding. So how are you
encouraging people to stop doing that kind of coding when you're
providing tools to help them do that kind of coding?

> The ranges solution Dvir suggests in [P2164]
>
> auto [index, it] = ranges::find_if(range | views::enumerate, pred);
>
> is pretty good too, and I was expecting something like that to come out of the ranges proposals. It’s still a little off from the more concise, proposed version though, especially given that we have all the other count_X and find_X functions doing very similar work. I don’t see how adding this template would make anything worse, once the return type and naming is fixed.

What if the user needs both the iterator and the index? I mean, if
they're going to access the value at that index, you may as well do it
through the iterator and save yourself the explicit offset, right? Is
that not a better "expression of the programmer's intent" than having
to offset the beginning of that range again? And what if the range
isn't random-access; that increment is an O(1) operation?

Basically, we don't need 3 ways to do a find. We need 1 way that has
the flexibility to do what's needed. If the user needs enumeration,
then they can be taught the way to do that: use `views::enumerate`.
That mechanism is *universal*, working on every algorithm and even
range-based `for` equally well and in a consistent way.

Why confuse this by adding a second way to enumerate a range that only
works in very specific cases?

Received on 2020-10-02 09:09:14