C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Relaxing precondition on std::set_intersection algorithm

From: Phil Endecott <std_proposals_list_at_[hidden]>
Date: Wed, 17 Sep 2025 16:55:21 +0100
Peter Neiss wrote:
> As I understand the problem of std::set_intersection better now, i do not
> believe anymore that it is a good idea to relax the precondition to make
> inplace usage possible.
>
> Instead I would like to propose and get feedback for following solution:
>
> auto remove_missing = [](auto first1, auto last1, auto first2, auto last2,
> auto comp)
> {
> return std::remove_if(
> first1, last1,[&](auto const& elem)
> { return not std::binary_search(first2, last2, elem, comp); }
> );
> };
>
> I suggest the algorithm above with the name remove_missing. It is quite
> similar to std::set_intersection(_copy).
> The implementation above has complexity O(N*log(M)). N = size of
> first1..last1 and M = size of first2..last2

No, we don't want that. It has terrible complexity. Why do you propose it?


> A better implementation is
> auto remove_missing = [](auto first1, auto last1, auto first2, auto last2,
> auto comp)
> {
> auto out = first1;
> while (first1 != last1 && first2 != last2)
> {
> if (comp(*first1, *first2))
> ++first1;
> else
> {
> if (!comp(*first2, *first1))
> *out++ = *first1++; // *first1 and *first2 are equivalent.
> else // THE else IS NEW
> ++first2;
> }
> }
> return out;
> };
>
> which has Complexity O(N+M).

That's better, but it's still far from ideal if you have random-access
iterators.

Let me offer the following:

template <std::random_access_iterator I1,
           std::random_access_iterator I2,
           typename O>
O set_intersection(I1 first_1, I1 last_1, I2 first_2, I2 last_2, O out)
{
   auto size_1 = last_1 - first_1;
   auto size_2 = last_2 - first_2;

   if (size_1 == 0 || size_2 == 0) return out;

   // if (*(last_1-1) < *first_2 || *(last_2-1) < *first_1) return out;

   auto root_1 = first_1 + size_1 / 2;
   auto root_2 = first_2 + size_2 / 2;

   if (*root_1 < *root_2) {
     out = set_intersection(first_1, last_1, first_2, root_2, out);
     out = set_intersection(root_1 + 1, last_1, root_2, last_2, out);

   } else if (*root_1 == *root_2) {
     out = set_intersection(first_1, root_1, first_2, root_2, out);
     *(out++) = *root_1;
     out = set_intersection(root_1 + 1, last_1, root_2 + 1, last_2, out);

   } else {
     out = set_intersection(first_1, root_1, first_2, last_2, out);
     out = set_intersection(root_1, last_1, root_2 + 1, last_2, out);
   }

   return out;
}


This has worst-case linear complexity, e.g. for interleaved inputs,
but typically closer to output-size complexity.

This does work in-place, if you pass out = first_1. I'd add a wrapper
called inplace_set_intersection to do that.

I believe it *doesn't* work if there are duplicate input elements - maybe
that's an easy fix, I haven't thought about it.

For the in-place case, it could std::move to *(out++) but you don't
want to
do that when it's not in-place.

You can add a test near the start that improves the complexity in the
case where there is no overlap; it's commented-out above. It doesn't
improve the worst-case complexity.

This was last discussed here in January 2024, when I proposed that
this could be a member function (or friend) of std::set:

https://lists.isocpp.org/std-proposals/2024/01/8902.php


Regards, Phil.

Received on 2025-09-17 15:55:25