Date: Wed, 17 Sep 2025 16:55:21 +0100
Peter Neiss wrote:
> As I understand the problem of std::set_intersection better now, i do not
> believe anymore that it is a good idea to relax the precondition to make
> inplace usage possible.
>
> Instead I would like to propose and get feedback for following solution:
>
> auto remove_missing = [](auto first1, auto last1, auto first2, auto last2,
> auto comp)
> {
> return std::remove_if(
> first1, last1,[&](auto const& elem)
> { return not std::binary_search(first2, last2, elem, comp); }
> );
> };
>
> I suggest the algorithm above with the name remove_missing. It is quite
> similar to std::set_intersection(_copy).
> The implementation above has complexity O(N*log(M)). N = size of
> first1..last1 and M = size of first2..last2
No, we don't want that. It has terrible complexity. Why do you propose it?
> A better implementation is
> auto remove_missing = [](auto first1, auto last1, auto first2, auto last2,
> auto comp)
> {
> auto out = first1;
> while (first1 != last1 && first2 != last2)
> {
> if (comp(*first1, *first2))
> ++first1;
> else
> {
> if (!comp(*first2, *first1))
> *out++ = *first1++; // *first1 and *first2 are equivalent.
> else // THE else IS NEW
> ++first2;
> }
> }
> return out;
> };
>
> which has Complexity O(N+M).
That's better, but it's still far from ideal if you have random-access
iterators.
Let me offer the following:
template <std::random_access_iterator I1,
std::random_access_iterator I2,
typename O>
O set_intersection(I1 first_1, I1 last_1, I2 first_2, I2 last_2, O out)
{
auto size_1 = last_1 - first_1;
auto size_2 = last_2 - first_2;
if (size_1 == 0 || size_2 == 0) return out;
// if (*(last_1-1) < *first_2 || *(last_2-1) < *first_1) return out;
auto root_1 = first_1 + size_1 / 2;
auto root_2 = first_2 + size_2 / 2;
if (*root_1 < *root_2) {
out = set_intersection(first_1, last_1, first_2, root_2, out);
out = set_intersection(root_1 + 1, last_1, root_2, last_2, out);
} else if (*root_1 == *root_2) {
out = set_intersection(first_1, root_1, first_2, root_2, out);
*(out++) = *root_1;
out = set_intersection(root_1 + 1, last_1, root_2 + 1, last_2, out);
} else {
out = set_intersection(first_1, root_1, first_2, last_2, out);
out = set_intersection(root_1, last_1, root_2 + 1, last_2, out);
}
return out;
}
This has worst-case linear complexity, e.g. for interleaved inputs,
but typically closer to output-size complexity.
This does work in-place, if you pass out = first_1. I'd add a wrapper
called inplace_set_intersection to do that.
I believe it *doesn't* work if there are duplicate input elements - maybe
that's an easy fix, I haven't thought about it.
For the in-place case, it could std::move to *(out++) but you don't
want to
do that when it's not in-place.
You can add a test near the start that improves the complexity in the
case where there is no overlap; it's commented-out above. It doesn't
improve the worst-case complexity.
This was last discussed here in January 2024, when I proposed that
this could be a member function (or friend) of std::set:
https://lists.isocpp.org/std-proposals/2024/01/8902.php
Regards, Phil.
> As I understand the problem of std::set_intersection better now, i do not
> believe anymore that it is a good idea to relax the precondition to make
> inplace usage possible.
>
> Instead I would like to propose and get feedback for following solution:
>
> auto remove_missing = [](auto first1, auto last1, auto first2, auto last2,
> auto comp)
> {
> return std::remove_if(
> first1, last1,[&](auto const& elem)
> { return not std::binary_search(first2, last2, elem, comp); }
> );
> };
>
> I suggest the algorithm above with the name remove_missing. It is quite
> similar to std::set_intersection(_copy).
> The implementation above has complexity O(N*log(M)). N = size of
> first1..last1 and M = size of first2..last2
No, we don't want that. It has terrible complexity. Why do you propose it?
> A better implementation is
> auto remove_missing = [](auto first1, auto last1, auto first2, auto last2,
> auto comp)
> {
> auto out = first1;
> while (first1 != last1 && first2 != last2)
> {
> if (comp(*first1, *first2))
> ++first1;
> else
> {
> if (!comp(*first2, *first1))
> *out++ = *first1++; // *first1 and *first2 are equivalent.
> else // THE else IS NEW
> ++first2;
> }
> }
> return out;
> };
>
> which has Complexity O(N+M).
That's better, but it's still far from ideal if you have random-access
iterators.
Let me offer the following:
template <std::random_access_iterator I1,
std::random_access_iterator I2,
typename O>
O set_intersection(I1 first_1, I1 last_1, I2 first_2, I2 last_2, O out)
{
auto size_1 = last_1 - first_1;
auto size_2 = last_2 - first_2;
if (size_1 == 0 || size_2 == 0) return out;
// if (*(last_1-1) < *first_2 || *(last_2-1) < *first_1) return out;
auto root_1 = first_1 + size_1 / 2;
auto root_2 = first_2 + size_2 / 2;
if (*root_1 < *root_2) {
out = set_intersection(first_1, last_1, first_2, root_2, out);
out = set_intersection(root_1 + 1, last_1, root_2, last_2, out);
} else if (*root_1 == *root_2) {
out = set_intersection(first_1, root_1, first_2, root_2, out);
*(out++) = *root_1;
out = set_intersection(root_1 + 1, last_1, root_2 + 1, last_2, out);
} else {
out = set_intersection(first_1, root_1, first_2, last_2, out);
out = set_intersection(root_1, last_1, root_2 + 1, last_2, out);
}
return out;
}
This has worst-case linear complexity, e.g. for interleaved inputs,
but typically closer to output-size complexity.
This does work in-place, if you pass out = first_1. I'd add a wrapper
called inplace_set_intersection to do that.
I believe it *doesn't* work if there are duplicate input elements - maybe
that's an easy fix, I haven't thought about it.
For the in-place case, it could std::move to *(out++) but you don't
want to
do that when it's not in-place.
You can add a test near the start that improves the complexity in the
case where there is no overlap; it's commented-out above. It doesn't
improve the worst-case complexity.
This was last discussed here in January 2024, when I proposed that
this could be a member function (or friend) of std::set:
https://lists.isocpp.org/std-proposals/2024/01/8902.php
Regards, Phil.
Received on 2025-09-17 15:55:25