C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Relaxing precondition on std::set_intersection algorithm

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Sun, 14 Sep 2025 18:26:29 -0400
On Sun, Sep 14, 2025 at 1:38 PM Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
wrote:

> On Sun, Sep 14, 2025 at 1:08 PM Peter Neiss via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
>> Ok. I have not thought about overlapping input ranges. Classical aliasing
>> problem. Modifying the first range at a later position, through the second
>> maybe makes the range unsorted. But when I try to construct an example, the
>> sorting seems to make this impossible and it still works out correctly.
>> I am not familiar with weak comparators and when they are allowed in the
>> stl. Is sorting by a weak comparator enough for a range to be sorted ?
>> Please give an example that breaks with overlapping input ranges. If this
>> is a problem inplace_set_intersection is a good idea, otherwise i would
>> prefer changing the precondition on std::set_intersection.
>>
> [...]
>
>> If the input ranges overlap, the output can generally only coincide with
>>> the first range (the source for the copy). Otherwise you can get wrong
>>> results with a weak comparator.
>>
>>
> I don't think this is true. The output range could be the same as the
> first input range, or it could be the same as the second input range, or it
> could be unrelated, or it could start before either input range.
> The output range obviously can't point into the *middle* of either input
> range, but nobody was claiming that.
>
> I wonder if the term "overlap with" is ever actually defined in the spec,
> or if there's an LWG issue about that. I think as the iterator type becomes
> more complicated than just a simple pointer, and the element types become
> more resource-managey and/or view-ish, "overlap" gets harder to define.
> What we mean, of course(tm), is that while the algorithm is running the
> values of all elements in the two input ranges must be stable and not
> change for any reason. And how we want to loosen that restriction is to say
> something like, the values of all *not-yet-read* elements in the two
> input ranges must be stable and not change (along with some explanation of
> when *does* the algorithm read each element).
>

...Oh, I see — the problem is actually pre-existing, and happens when (A)
the two *input* ranges overlap* each other* and (B) the type's "copying
from input to output" operation is destructive/modifying.
So we have something like this:

// https://godbolt.org/z/TTn3f5E8P

struct S {
explicit S() = default;
explicit S(int i) : p_(std::make_unique<int>(i)) {}
friend bool operator<(const S& a, const S& b) {
assert(a.p_ != nullptr);
assert(b.p_ != nullptr); // this assertion will fail
return *a.p_ < *b.p_;
}
std::unique_ptr<int> p_;
};

#define M(it) std::make_move_iterator(it)

int main() {
fprintf(stderr, "First input range starts in the middle of second:");
{
S a[4] = { S(2), S(2), S(4), S(5) };
S out[4];
auto dlast = std::set_intersection(M(a+1), M(a+4), M(a), M(a+3), out); //
boom goes the dynamite
assert(dlast == out + 2); // we never get here
assert(*out[0].p_ == 2);
assert(*out[1].p_ == 4);
}
}

Here `out` does not "overlap" either the first input range or the second
input range; but because the two input ranges overlap *each other*, and
we're using a move_iterator, bad things happen.

It might be very appreciated if someone were able to bring a paper
codifying, in proper standardese, that set_intersection should do precisely
what Vendors All Understand set_intersection actually, physically, does.
Such a paper would incidentally both implement your feature request (that
set_intersection should be specified to work when Vendors All Understand
that it physically will) and also fix this defect (that set_intersection
should be permitted to fail when Vendors All Understand it cannot possibly
work, physically). However, I can't think of a sensible way to word that
paper except to write something like

    Effects: Equivalent to:
        while (first1 != last1 && first2 != last2) {
            if (*first1 < *first2) {
                ++first1;
            } else if (*first2 < *first1) {
                ++first2;
            } else {
                *out++ = *first1; ++first1; ++first2;
            }
        }
        return out;

We've had good success with that kind of specification in places like
[specialized.algorithms]
<https://eel.is/c++draft/uninitialized.construct.default>. I imagine such a
paper wouldn't *sail* through, but it might work.

my $.02,
–Arthur

>

Received on 2025-09-14 22:26:48