Date: Thu, 11 Jan 2024 12:48:53 +0100
Hi,
On 10/01/2024 20:48, Vladimir Grigoriev via Std-Discussion wrote:
> The algorithm std::unique has a side effect that is not always
> desirable. That is it can change the content of a container.
> I named it stable_unique though any more suitable name may be proposed.
I don't think "stable" is a good prefix to use here. Stability refers to
a property of the output of certain algorithms that generate a
permutation of their input (sort, partition, ...), and specifically to
the fact that elements that compare equivalent (as far as the algorithm
is concerned) retain their relative ordering after the permutation --
the ordering as it was in the sequence before it was processed by the
algorithm.
In other words, if you have something like:
> struct Employee { std::string name; Job job; };
>
> Employee employees[] = {
> { "Alice", Job::Engineer },
> { "Bob", Job::Engineer },
> { "Carol", Job::Manager },
> { "Dave", Job::QA },
> { "Eve", Job::Manager },
> };
then a
> stable_partition(employees, [](auto &&x) { return x.job == Job::Manager; });
*must* yield the { Carol, Eve, Alice, Bob, Dave } permutation, while a
mere `partition` may return any permutation of { Carol, Eve } followed
by any permutation of { Alice, Bob, Dave }. The reason why both
alternatives are offered for certain algorithms is because the
non-stable version can be faster, so that's the trade-off.
The concept of stability doesn't really have sense for `unique`, as it's
not a permuting algorithm (just like `remove_if` isn't just a
`partition`): `unique` "eliminates all but the first element from every
consecutive group of equivalent elements"
(https://eel.is/c++draft/alg.unique#3).
I am not sure if "eliminates" is formally defined somewhere, but what it
means concretely is that the tail of elements (going from the returned
iterator to the `last`) consists of elements that have been (possibly)
moved-from. That's a "damaging" operation, as you observe; in general
you can't even _reason_ about their contents (elements are partially
formed). But what this means is that the output is not a permutation,
and therefore there's no concept of stability to apply.
(On the other hand, `unique` today is already ""stable"", in the sense
that the non-eliminated elements are left in the same relative order of
the original sequence; however, I don't seem to find a provision for that.)
Now, for the sake of the argument let's call your algorithm ...
CTFEFECGOEE ("Collect The First Element From Every Consecutive Group Of
Equivalent Elements") -- sorry, I don't know if it's got an established
name in literature. Your output would be a permutation, and as such, you
could have a stable and a non-stable version (assuming a faster
implementation is available if you relinquish stability).
For
> Employee employees[] = {
> { "Alice", Job::Engineer },
> { "Bob", Job::Engineer },
> { "Carol", Job::Manager },
> { "Dave", Job::QA },
> { "Eve", Job::Manager },
> { "Frank", Job::Manager },
> { "Grace", Job::Manager },
> };
then stable_CTFEFECGOEE must yield
{ Alice, Carol, Dave, Eve, Bob, Frank, Grace }
while non-stable CTFEFECGOEE could e.g. yield
{ Alice, Carol, Dave, Eve, Bob, Grace, Frank }
or similar variations. Pay attention to the fact that `unique` does not
require its input to be sorted, and neither should CTFEFECGOEE; you need
to "carefully" define CTFEFECGOEE's behavior if you want CTFEFECGOEE's
output to still be sorted when its input is.
If you're willing to continue with this idea, please survey if this
algorithm has indeed an established name (or names) in classic algorithm
literature / other programming language libraries / ..., and then come
up with a full proposal for it?
My c,
On 10/01/2024 20:48, Vladimir Grigoriev via Std-Discussion wrote:
> The algorithm std::unique has a side effect that is not always
> desirable. That is it can change the content of a container.
> I named it stable_unique though any more suitable name may be proposed.
I don't think "stable" is a good prefix to use here. Stability refers to
a property of the output of certain algorithms that generate a
permutation of their input (sort, partition, ...), and specifically to
the fact that elements that compare equivalent (as far as the algorithm
is concerned) retain their relative ordering after the permutation --
the ordering as it was in the sequence before it was processed by the
algorithm.
In other words, if you have something like:
> struct Employee { std::string name; Job job; };
>
> Employee employees[] = {
> { "Alice", Job::Engineer },
> { "Bob", Job::Engineer },
> { "Carol", Job::Manager },
> { "Dave", Job::QA },
> { "Eve", Job::Manager },
> };
then a
> stable_partition(employees, [](auto &&x) { return x.job == Job::Manager; });
*must* yield the { Carol, Eve, Alice, Bob, Dave } permutation, while a
mere `partition` may return any permutation of { Carol, Eve } followed
by any permutation of { Alice, Bob, Dave }. The reason why both
alternatives are offered for certain algorithms is because the
non-stable version can be faster, so that's the trade-off.
The concept of stability doesn't really have sense for `unique`, as it's
not a permuting algorithm (just like `remove_if` isn't just a
`partition`): `unique` "eliminates all but the first element from every
consecutive group of equivalent elements"
(https://eel.is/c++draft/alg.unique#3).
I am not sure if "eliminates" is formally defined somewhere, but what it
means concretely is that the tail of elements (going from the returned
iterator to the `last`) consists of elements that have been (possibly)
moved-from. That's a "damaging" operation, as you observe; in general
you can't even _reason_ about their contents (elements are partially
formed). But what this means is that the output is not a permutation,
and therefore there's no concept of stability to apply.
(On the other hand, `unique` today is already ""stable"", in the sense
that the non-eliminated elements are left in the same relative order of
the original sequence; however, I don't seem to find a provision for that.)
Now, for the sake of the argument let's call your algorithm ...
CTFEFECGOEE ("Collect The First Element From Every Consecutive Group Of
Equivalent Elements") -- sorry, I don't know if it's got an established
name in literature. Your output would be a permutation, and as such, you
could have a stable and a non-stable version (assuming a faster
implementation is available if you relinquish stability).
For
> Employee employees[] = {
> { "Alice", Job::Engineer },
> { "Bob", Job::Engineer },
> { "Carol", Job::Manager },
> { "Dave", Job::QA },
> { "Eve", Job::Manager },
> { "Frank", Job::Manager },
> { "Grace", Job::Manager },
> };
then stable_CTFEFECGOEE must yield
{ Alice, Carol, Dave, Eve, Bob, Frank, Grace }
while non-stable CTFEFECGOEE could e.g. yield
{ Alice, Carol, Dave, Eve, Bob, Grace, Frank }
or similar variations. Pay attention to the fact that `unique` does not
require its input to be sorted, and neither should CTFEFECGOEE; you need
to "carefully" define CTFEFECGOEE's behavior if you want CTFEFECGOEE's
output to still be sorted when its input is.
If you're willing to continue with this idea, please survey if this
algorithm has indeed an established name (or names) in classic algorithm
literature / other programming language libraries / ..., and then come
up with a full proposal for it?
My c,
-- Giuseppe D'Angelo
Received on 2024-01-11 11:48:57