Date: Tue, 16 Sep 2025 09:26:19 +0200
On Mon, Sep 15, 2025 at 10:10 PM Arthur O'Dwyer via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> On Mon, Sep 15, 2025 at 3:06 PM Peter Neiss <peter.stefan.neiss_at_[hidden]>
> wrote:
>
>> I agree with Arthur that std::set_intersection is equivalent to this
>> specific algorithm implementation:
>> while (first1 != last1 && first2 != last2) {
>> if (*first1 < *first2) {
>> ++first1;
>> } else if (*first2 < *first1) {
>> ++first2;
>> } else {
>> *out++ = *first1; ++first1; ++first2;
>> }
>> }
>> return out;
>>
>> Back in the day a defect report on set algorithm
>> https://cplusplus.github.io/LWG/issue291 was solved it seems to retain
>> this algorithm.
>> It is not easy to analyse this algorithm with weak ordering and
>> arbitrarily overlapping input and possible output ranges. To just
>> standardise the algorithm is a smart move.
>> Add some remarks to when it works. E.g. nonoverlapping ranges and strong
>> ordering. Or First range overlap with output range, etc.
>> I don't feel up to the task of writing a proposal, but I am willing to
>> help and give my opinion.
>>
>
> ...Although, I see that libc++'s `std::set_intersection` is actually
> cleverer than that, for forward iterators specifically. This was done by
> Iúri Chaer in July 2024:
> https://github.com/llvm/llvm-project/pull/75230
> At a glance, I think this clever algorithm still preserves all the
> aliasing behavior of the algorithm above. That is, it doesn't make anything
> undefined that would have been well-defined with the simpler algorithm.
> That is, it doesn't introduce a dependency on any of the preconditions
> Peter's objecting to. But maybe Iúri (cc'ed) can confirm?
>
> It also occurs to me that Nikolas Klauser has been working on a
> compiler-assisted approach to replacing paired `<`s with `<=>` when that is
> equivalent.
>
This is very cool, recently there was question on S0
<https://stackoverflow.com/questions/79760145> why set calls <=> twice on
target element.
>
> This means that for a type like `std::string`, we would do
> the-moral-equivalent-of-memcmp only once per step, rather than twice; which
> is a huge speedup.
> So any specification effort would need to avoid implying that we
> *actually* call `<` a certain number of times, or *actually* call `++` a
> certain number of times. That makes the specification effort a fair bit
> harder, I think.
>
>
>
I still do not follow why would we want to make specification of algorithm
so much harder to read for beginners for some very rare usecase(although to
be fair std::set_intersection is rare anyway so it is not a such rare case
when we focus on std::set_intersection use cases), not to mention that
violation of requirement can no longer be easily checked(i.e. just check
out is not in range 1 or range 2 when all 5 iterators are pointers)...
Why just not make stateful search work fine by specifying exactly number of
comparisons erase_if does to be size of input range of erase_if?
e.g.
https://godbolt.org/z/KcedMvKaj
(code is not generic and uses ==, but I hope idea is clear)
This may have bad performance since it keeps restoring/checking iterator
state, but I did not have time to benchmark.
std-proposals_at_[hidden]> wrote:
> On Mon, Sep 15, 2025 at 3:06 PM Peter Neiss <peter.stefan.neiss_at_[hidden]>
> wrote:
>
>> I agree with Arthur that std::set_intersection is equivalent to this
>> specific algorithm implementation:
>> while (first1 != last1 && first2 != last2) {
>> if (*first1 < *first2) {
>> ++first1;
>> } else if (*first2 < *first1) {
>> ++first2;
>> } else {
>> *out++ = *first1; ++first1; ++first2;
>> }
>> }
>> return out;
>>
>> Back in the day a defect report on set algorithm
>> https://cplusplus.github.io/LWG/issue291 was solved it seems to retain
>> this algorithm.
>> It is not easy to analyse this algorithm with weak ordering and
>> arbitrarily overlapping input and possible output ranges. To just
>> standardise the algorithm is a smart move.
>> Add some remarks to when it works. E.g. nonoverlapping ranges and strong
>> ordering. Or First range overlap with output range, etc.
>> I don't feel up to the task of writing a proposal, but I am willing to
>> help and give my opinion.
>>
>
> ...Although, I see that libc++'s `std::set_intersection` is actually
> cleverer than that, for forward iterators specifically. This was done by
> Iúri Chaer in July 2024:
> https://github.com/llvm/llvm-project/pull/75230
> At a glance, I think this clever algorithm still preserves all the
> aliasing behavior of the algorithm above. That is, it doesn't make anything
> undefined that would have been well-defined with the simpler algorithm.
> That is, it doesn't introduce a dependency on any of the preconditions
> Peter's objecting to. But maybe Iúri (cc'ed) can confirm?
>
> It also occurs to me that Nikolas Klauser has been working on a
> compiler-assisted approach to replacing paired `<`s with `<=>` when that is
> equivalent.
>
This is very cool, recently there was question on S0
<https://stackoverflow.com/questions/79760145> why set calls <=> twice on
target element.
>
> This means that for a type like `std::string`, we would do
> the-moral-equivalent-of-memcmp only once per step, rather than twice; which
> is a huge speedup.
> So any specification effort would need to avoid implying that we
> *actually* call `<` a certain number of times, or *actually* call `++` a
> certain number of times. That makes the specification effort a fair bit
> harder, I think.
>
>
>
I still do not follow why would we want to make specification of algorithm
so much harder to read for beginners for some very rare usecase(although to
be fair std::set_intersection is rare anyway so it is not a such rare case
when we focus on std::set_intersection use cases), not to mention that
violation of requirement can no longer be easily checked(i.e. just check
out is not in range 1 or range 2 when all 5 iterators are pointers)...
Why just not make stateful search work fine by specifying exactly number of
comparisons erase_if does to be size of input range of erase_if?
e.g.
https://godbolt.org/z/KcedMvKaj
(code is not generic and uses ==, but I hope idea is clear)
This may have bad performance since it keeps restoring/checking iterator
state, but I did not have time to benchmark.
Received on 2025-09-16 07:26:35