Date: Mon, 15 Sep 2025 21:06:11 +0200
I agree with Arthur that std::set_intersection is equivalent to this
specific algorithm implementation:
while (first1 != last1 && first2 != last2) {
if (*first1 < *first2) {
++first1;
} else if (*first2 < *first1) {
++first2;
} else {
*out++ = *first1; ++first1; ++first2;
}
}
return out;
Back in the day a defect report on set algorithm
https://cplusplus.github.io/LWG/issue291 was solved it seems to retain this
algorithm.
It is not easy to analyse this algorithm with weak ordering and arbitrarily
overlapping input and possible output ranges. To just standardise the
algorithm is a smart move.
Add some remarks to when it works. E.g. nonoverlapping ranges and strong
ordering. Or First range overlap with output range, etc.
I don't feel up to the task of writing a proposal, but I am willing to help
and give my opinion.
Am Mo., 15. Sept. 2025 um 00:26 Uhr schrieb Arthur O'Dwyer <
arthur.j.odwyer_at_[hidden]>:
> On Sun, Sep 14, 2025 at 1:38 PM Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
> wrote:
>
>> On Sun, Sep 14, 2025 at 1:08 PM Peter Neiss via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>>
>>> Ok. I have not thought about overlapping input ranges. Classical
>>> aliasing problem. Modifying the first range at a later position, through
>>> the second maybe makes the range unsorted. But when I try to construct an
>>> example, the sorting seems to make this impossible and it still works out
>>> correctly.
>>> I am not familiar with weak comparators and when they are allowed in the
>>> stl. Is sorting by a weak comparator enough for a range to be sorted ?
>>> Please give an example that breaks with overlapping input ranges. If
>>> this is a problem inplace_set_intersection is a good idea, otherwise i
>>> would prefer changing the precondition on std::set_intersection.
>>>
>> [...]
>>
>>> If the input ranges overlap, the output can generally only coincide with
>>>> the first range (the source for the copy). Otherwise you can get wrong
>>>> results with a weak comparator.
>>>
>>>
>> I don't think this is true. The output range could be the same as the
>> first input range, or it could be the same as the second input range, or it
>> could be unrelated, or it could start before either input range.
>> The output range obviously can't point into the *middle* of either input
>> range, but nobody was claiming that.
>>
>> I wonder if the term "overlap with" is ever actually defined in the spec,
>> or if there's an LWG issue about that. I think as the iterator type becomes
>> more complicated than just a simple pointer, and the element types become
>> more resource-managey and/or view-ish, "overlap" gets harder to define.
>> What we mean, of course(tm), is that while the algorithm is running the
>> values of all elements in the two input ranges must be stable and not
>> change for any reason. And how we want to loosen that restriction is to say
>> something like, the values of all *not-yet-read* elements in the two
>> input ranges must be stable and not change (along with some explanation of
>> when *does* the algorithm read each element).
>>
>
> ...Oh, I see — the problem is actually pre-existing, and happens when (A)
> the two *input* ranges overlap* each other* and (B) the type's "copying
> from input to output" operation is destructive/modifying.
> So we have something like this:
>
> // https://godbolt.org/z/TTn3f5E8P
>
> struct S {
> explicit S() = default;
> explicit S(int i) : p_(std::make_unique<int>(i)) {}
> friend bool operator<(const S& a, const S& b) {
> assert(a.p_ != nullptr);
> assert(b.p_ != nullptr); // this assertion will fail
> return *a.p_ < *b.p_;
> }
> std::unique_ptr<int> p_;
> };
>
> #define M(it) std::make_move_iterator(it)
>
> int main() {
> fprintf(stderr, "First input range starts in the middle of second:");
> {
> S a[4] = { S(2), S(2), S(4), S(5) };
> S out[4];
> auto dlast = std::set_intersection(M(a+1), M(a+4), M(a), M(a+3), out); //
> boom goes the dynamite
> assert(dlast == out + 2); // we never get here
> assert(*out[0].p_ == 2);
> assert(*out[1].p_ == 4);
> }
> }
>
> Here `out` does not "overlap" either the first input range or the second
> input range; but because the two input ranges overlap *each other*, and
> we're using a move_iterator, bad things happen.
>
> It might be very appreciated if someone were able to bring a paper
> codifying, in proper standardese, that set_intersection should do precisely
> what Vendors All Understand set_intersection actually, physically, does.
> Such a paper would incidentally both implement your feature request (that
> set_intersection should be specified to work when Vendors All Understand
> that it physically will) and also fix this defect (that set_intersection
> should be permitted to fail when Vendors All Understand it cannot possibly
> work, physically). However, I can't think of a sensible way to word that
> paper except to write something like
>
> Effects: Equivalent to:
> while (first1 != last1 && first2 != last2) {
> if (*first1 < *first2) {
> ++first1;
> } else if (*first2 < *first1) {
> ++first2;
> } else {
> *out++ = *first1; ++first1; ++first2;
> }
> }
> return out;
>
> We've had good success with that kind of specification in places like
> [specialized.algorithms]
> <https://eel.is/c++draft/uninitialized.construct.default>. I imagine such
> a paper wouldn't *sail* through, but it might work.
>
> my $.02,
> –Arthur
>
>>
specific algorithm implementation:
while (first1 != last1 && first2 != last2) {
if (*first1 < *first2) {
++first1;
} else if (*first2 < *first1) {
++first2;
} else {
*out++ = *first1; ++first1; ++first2;
}
}
return out;
Back in the day a defect report on set algorithm
https://cplusplus.github.io/LWG/issue291 was solved it seems to retain this
algorithm.
It is not easy to analyse this algorithm with weak ordering and arbitrarily
overlapping input and possible output ranges. To just standardise the
algorithm is a smart move.
Add some remarks to when it works. E.g. nonoverlapping ranges and strong
ordering. Or First range overlap with output range, etc.
I don't feel up to the task of writing a proposal, but I am willing to help
and give my opinion.
Am Mo., 15. Sept. 2025 um 00:26 Uhr schrieb Arthur O'Dwyer <
arthur.j.odwyer_at_[hidden]>:
> On Sun, Sep 14, 2025 at 1:38 PM Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
> wrote:
>
>> On Sun, Sep 14, 2025 at 1:08 PM Peter Neiss via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>>
>>> Ok. I have not thought about overlapping input ranges. Classical
>>> aliasing problem. Modifying the first range at a later position, through
>>> the second maybe makes the range unsorted. But when I try to construct an
>>> example, the sorting seems to make this impossible and it still works out
>>> correctly.
>>> I am not familiar with weak comparators and when they are allowed in the
>>> stl. Is sorting by a weak comparator enough for a range to be sorted ?
>>> Please give an example that breaks with overlapping input ranges. If
>>> this is a problem inplace_set_intersection is a good idea, otherwise i
>>> would prefer changing the precondition on std::set_intersection.
>>>
>> [...]
>>
>>> If the input ranges overlap, the output can generally only coincide with
>>>> the first range (the source for the copy). Otherwise you can get wrong
>>>> results with a weak comparator.
>>>
>>>
>> I don't think this is true. The output range could be the same as the
>> first input range, or it could be the same as the second input range, or it
>> could be unrelated, or it could start before either input range.
>> The output range obviously can't point into the *middle* of either input
>> range, but nobody was claiming that.
>>
>> I wonder if the term "overlap with" is ever actually defined in the spec,
>> or if there's an LWG issue about that. I think as the iterator type becomes
>> more complicated than just a simple pointer, and the element types become
>> more resource-managey and/or view-ish, "overlap" gets harder to define.
>> What we mean, of course(tm), is that while the algorithm is running the
>> values of all elements in the two input ranges must be stable and not
>> change for any reason. And how we want to loosen that restriction is to say
>> something like, the values of all *not-yet-read* elements in the two
>> input ranges must be stable and not change (along with some explanation of
>> when *does* the algorithm read each element).
>>
>
> ...Oh, I see — the problem is actually pre-existing, and happens when (A)
> the two *input* ranges overlap* each other* and (B) the type's "copying
> from input to output" operation is destructive/modifying.
> So we have something like this:
>
> // https://godbolt.org/z/TTn3f5E8P
>
> struct S {
> explicit S() = default;
> explicit S(int i) : p_(std::make_unique<int>(i)) {}
> friend bool operator<(const S& a, const S& b) {
> assert(a.p_ != nullptr);
> assert(b.p_ != nullptr); // this assertion will fail
> return *a.p_ < *b.p_;
> }
> std::unique_ptr<int> p_;
> };
>
> #define M(it) std::make_move_iterator(it)
>
> int main() {
> fprintf(stderr, "First input range starts in the middle of second:");
> {
> S a[4] = { S(2), S(2), S(4), S(5) };
> S out[4];
> auto dlast = std::set_intersection(M(a+1), M(a+4), M(a), M(a+3), out); //
> boom goes the dynamite
> assert(dlast == out + 2); // we never get here
> assert(*out[0].p_ == 2);
> assert(*out[1].p_ == 4);
> }
> }
>
> Here `out` does not "overlap" either the first input range or the second
> input range; but because the two input ranges overlap *each other*, and
> we're using a move_iterator, bad things happen.
>
> It might be very appreciated if someone were able to bring a paper
> codifying, in proper standardese, that set_intersection should do precisely
> what Vendors All Understand set_intersection actually, physically, does.
> Such a paper would incidentally both implement your feature request (that
> set_intersection should be specified to work when Vendors All Understand
> that it physically will) and also fix this defect (that set_intersection
> should be permitted to fail when Vendors All Understand it cannot possibly
> work, physically). However, I can't think of a sensible way to word that
> paper except to write something like
>
> Effects: Equivalent to:
> while (first1 != last1 && first2 != last2) {
> if (*first1 < *first2) {
> ++first1;
> } else if (*first2 < *first1) {
> ++first2;
> } else {
> *out++ = *first1; ++first1; ++first2;
> }
> }
> return out;
>
> We've had good success with that kind of specification in places like
> [specialized.algorithms]
> <https://eel.is/c++draft/uninitialized.construct.default>. I imagine such
> a paper wouldn't *sail* through, but it might work.
>
> my $.02,
> –Arthur
>
>>
Received on 2025-09-15 19:06:25