On Sat, Nov 12, 2022 at 7:01 AM Lénárd Szolnoki wrote:
On Fri, 11 Nov 2022 18:51:03 -0500 Jason McKesson wrote:
> On Fri, Nov 11, 2022 at 5:12 PM Lénárd Szolnoki wrote:
> >
> > I suppose this applies to unsized ranges. But range does not have
> > to be random access to have std::ranges::size, which is O(1) and
> > meant to be cheap.
> >
> > For sized ranges it absolutely makes sense to do the preallocation,
> > and it doesn't require traversing the range twice.

+1.
 
> > I would say that for unsized ranges it makes more sense to not do
> > this, and do the usual multiple reallocations. 
>
> I disagree. For most ranges that don't have a size, traversal is
> inexpensive next to the cost of memory allocations (and the requisite
> copying/moving of data). Even complex views that need to invoke some
> functor (like `filter_view` and the like) will generally be cheaper
> than doing potentially dozens of memory allocations. Indeed, it's
> pretty rare to find a forward range for whom iteration is so slow that
> multiple memory allocation is preferred.

Strong disagree — notice that in order to do "dozens of memory allocations," you'd need to be allocating 2^dozens elements — 2^20 is already a million elements. (Admittedly on MSVC they use a different growth factor: 1.5^20 is only like 3000.) 
Also, Ranges iterators can be arbitrarily slow to traverse: all you have to do is stick a `views::transform` or `views::filter` in there, and boom, it's super slow.
So I think Lénárd's call is absolutely right: `vector` should certainly trust a `sized_range` to report its size accurately, but if it's not a sized range, `vector` shouldn't do any pre-traversal. Pre-traversal can even be dangerous, for an input range (such as an `istream_iterator`); and I wouldn't trust it to be non-dangerous in practice even for a range that self-reported as `forward_range`.

C++20 provides a completely standard way for a range to self-report its size: it's `sized_range` and `std::ranges::size`. We don't need `vector` second-guessing that approach.

(However, as seen below, it seems that all major library vendors disagree with my take: they will use `distance` and preallocate.)
 
I'm quite surprised on this benchmark's results, but here you go:
https://quick-bench.com/q/usHB_TCGMgB8b7lnjettooYGVnc

(Executive summary: InputIterator is faster than ForwardIterator, on GCC 12.2 + libstdc++ at -O2.)

I'm surprised by that benchmark too!
On Clang + libc++, I get much more sensible numbers, with Forward better than Input at all optimization levels greater than -O0:

$ clang++ x.cpp -std=c++20 -I... -L... -lbenchmark -lbenchmark_main -g

$ ./a.out

[...]

------------------------------------------------------------

Benchmark                  Time             CPU   Iterations

------------------------------------------------------------

VectorFromForward     113493 ns       113403 ns         6319

VectorFromInput       110535 ns       110468 ns         6243

$ clang++ x.cpp -std=c++20 -I... -L... -lbenchmark -lbenchmark_main -g -O1

$ ./a.out

[...]

------------------------------------------------------------

Benchmark                  Time             CPU   Iterations

------------------------------------------------------------

VectorFromForward       1656 ns         1655 ns       396987

VectorFromInput         3045 ns         3043 ns       226321


Even knocking your `arr_size` number all the way up to 1'024'000, I still get these numbers at -O2:

------------------------------------------------------------

Benchmark                  Time             CPU   Iterations

------------------------------------------------------------

VectorFromForward    1363242 ns      1361088 ns          510

VectorFromInput      2429556 ns      2427262 ns          298


As far as I can tell, libc++ does do the optimization we're discussing! It sees that the iterator is a Cpp17ForwardIterator and uses `std::distance` (not `std::ranges::distance`, because libc++ hasn't yet implemented C++20 `vector`) to retrieve the number of elements and pre-allocate that much. So the only difference here seems to be that libc++ and/or Clang can do `std::distance` of these filter_view::iterators much faster than GCC+libstdc++, for some reason.
I have no idea why that should be, and would be interested to learn, if anyone digs into it.

–Arthur