std-proposals: Re: Finding the index of an element (std::find_first

From: Tony V E <tvaneerd_at_[hidden]>
Date: Sat, 03 Oct 2020 17:08:35 -0400

Disregarding the tone of the reply (apology already accepted, let's move on), water == hydrogen + oxygen is actually a great example.

It has a special name because it is *common*.

Most compounds don't get special names.

The only question here is whether ‎the proposed compound/function is common enough to be worth a special name.

Note that committee time is finite, so the cost is that some other library feature does not get standardized. (and small cognitive cost of one more function in std::)

These are the costs of any library proposal.

Sent from my BlackBerry portable Babbage Device

From: Kosher Yosher via Std-Proposals

Sent: Friday, October 2, 2020 1:41 PM

To: Std-Proposals

Reply To: std-proposals_at_[hidden]

Cc: Kosher Yosher

Subject: Re: [std-proposals] Finding the index of an element (std::find_first_index)

1 function call being more concise than 2 is not an opinion, it's a fact. I honestly can't believe I have to explain this over multiple emails. I hope we can agree that calling water "water" is better than "hydrogen + oxygen". Apply the exact same logic to programming languages please.

And if you insist on an example that is "grounded":

std::list<int> bigDynamicList; /* maybe part of a library you can't modify */

std::vector<int> someLUT; /* happens to be what you are stuck with and you need to do something with it now */

...

auto i = /* insert std_find_if + distance or the proposed template to search in bigDynamicList. */

someLUT[i] = 42;

is not as far fetched as you make it out to be.

Using this indirection based index lookup concept is a very common operation, using an iterator for this purpose as you suggest is not what I would do here in C++, that'd be even uglier: the index semantics are fine and the intent is clear with the template function. If someLUT is not random access like you suggested, the proposed template is still going to be the one doing the minimum amount of work possible. You haven't pointed out a single gotcha yet, because there is none. It does what you would do by hand which is the correct solution for this operation.

If the user needs both iterator and index use the ranged version. It's ok to have both in your toolkit and use the one that's concise for the use-case at hand. Most GetX functions for this type of A/B indirection do not need the iterator, they operate by definition on indices, so muddying up code everywhere with unused iterators / tuple elements does not sound good to me as a "standard" solution. Please don't say that's my opinion again or I will sprinkle `int unusedDummy123;` variables throughout your code. :)

And about the last point of this function causing confusion ... let's keep it real: it's a function that says it returns the index of an element given a predicate, and like Arthur pointed out, fits in with the already existing count_X functions sort of, it's not an alien family of functions.

auto num_items0 = std::count_if(v.begin(), v.end(), [](const auto& e){return e.abc == 0;});

auto index_of_first0 = std::count_until(v.begin(), v.end(), [](const auto& e){return e.abc == 0;});

Looks like a nice buddy sitting next to count_if.

On Fri, 2 Oct 2020 at 15:09, Jason McKesson via Std-Proposals <std-proposals_at_[hidden]> wrote:

On Thu, Oct 1, 2020 at 8:20 PM Gokhan via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> But in those situations, the initial range is undoubtedly an array (performance game programming doesn't do linked-lists), so `distance` is 100% OK.
>
>
>
> Yes, but that is not the point. Theoretical examples like that are cherries the standard picks on purpose in an attempt to make the library code as generic as possible without sacrificing performance. The example, regardless of how silly it might be in practice, is valid in that regard, a program can be written to prove it, and hence is a point in favour of the template.

Changes to the standard need to be justified on the basis of making
someone's life as a C++ programmer *materially better*. If all of the
examples of cases where your suggestion are better than the
alternatives are purely theoretical, if the actual real-world cases
where someone needs this functionality can be covered by a trivial
addition to their code, then your idea has very weak motivation.

Thus far, your motivation comes down to two points:

1: Aesthetics. 1 function call is cleaner than 2.

2: Ranges of size beyond that of a difference-type.

#1 is purely an opinion, and #2 is something that happens so rarely in
the real world that it's basically a rounding error.

However, with C++20 canonizing 2's complement signed integers, we do
have the means to solve #2 now. Previously, if a positive difference
from `std::distance` would exceed the different_type, then you get
undefined behavior. We could now say that `std::distance` in these
cases (and all iterator differencing functionality it is based on)
behaves exactly as if you had gotten the difference as a `size_type`,
then cast it to a `difference_type`. This would allow you to recover
values exceeding the signed range by casting it back to `size_type`.
Which C++20 makes valid.

This is more generally useful than a targeted solution specifically
for `std::find` and similar algorithms.

> The same theoretical reasoning applies as an argument against Matthews point about the std::distance range covering the allocation limit in practice. The standard is written against an imaginary machine that *can* allocate the full max(size_t) range.
>
>
>
> std::distance also returns a signed integer type – containers are indexed by size_type (unsigned). Another point in favour of the suggested template.
>
>
>
> But, while all of that holds, it’s besides the more important practical argument that it is simply more concise to have 1 vs 2 function calls. If one of the goals of the standard committee is to convince people to stop writing C in C++, and express their intent clearly, then this is what you have to do: provide the absolute shortest form in the standard that Just Works(TM). The suggested template function is that.

The structure of that argument could be applied to virtually anything.
At some point, you have to look at a thing and ask whether the "1
function call" version is worth the effort of adding to the standard
compared to the "2 function call" version. Building higher level
functionality out of lower level functionality is perfectly normal.
Nobody expects the standard to provide a "do_my_work_for_me" function.

So if you're going this route, then you need to provide a convincing
argument that this code is sufficiently common to be worth adding a
specific function for it. Especially since you're not just talking
about one function but an entire family of them.

Also, I don't understand this: "convince people to stop writing C in
C++". Encouraging people to use indices when they could use iterators
is a big part of the "C in C++" style of coding. So how are you
encouraging people to stop doing that kind of coding when you're
providing tools to help them do that kind of coding?

> The ranges solution Dvir suggests in [P2164]
>
> auto [index, it] = ranges::find_if(range | views::enumerate, pred);
>
> is pretty good too, and I was expecting something like that to come out of the ranges proposals. It’s still a little off from the more concise, proposed version though, especially given that we have all the other count_X and find_X functions doing very similar work. I don’t see how adding this template would make anything worse, once the return type and naming is fixed.

What if the user needs both the iterator and the index? I mean, if
they're going to access the value at that index, you may as well do it
through the iterator and save yourself the explicit offset, right? Is
that not a better "expression of the programmer's intent" than having
to offset the beginning of that range again? And what if the range
isn't random-access; that increment is an O(1) operation?

Basically, we don't need 3 ways to do a find. We need 1 way that has
the flexibility to do what's needed. If the user needs enumeration,
then they can be taught the way to do that: use `views::enumerate`.
That mechanism is *universal*, working on every algorithm and even
range-based `for` equally well and in a consistent way.

Why confuse this by adding a second way to enumerate a range that only
works in very specific cases?
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2020-10-03 16:08:39