ISOCPP std-proposals List: Re: [std-proposals] New vector range constructor to improve performance with forward ranges

From: Alexander Bessonov <alexbav_at_[hidden]>
Date: Sat, 12 Nov 2022 01:17:39 +0300

Yes, but constructor may check for sized_range using the corresponding concept.

Sincerely,
Alexander Bessonov
HHD Software Ltd.

From: Lénárd Szolnoki via Std-Proposals
Sent: Saturday, November 12, 2022 1:12 AM
To: std-proposals_at_[hidden]
Cc: Lénárd Szolnoki
Subject: Re: [std-proposals] New vector range constructor to improve performance with forward ranges

Hi,

I suppose this applies to unsized ranges. But range does not have to be random access to have std::ranges::size, which is O(1) and meant to be cheap.

For sized ranges it absolutely makes sense to do the preallocation, and it doesn't require traversing the range twice.

I would say that for unsized ranges it makes more sense to not do this, and do the usual multiple reallocations.

The user of the interface could have control of the behavior by adapting an unsized range to be sized. A `std::ranges::sized_view` that called `std::ranges::distance` on construction comes to mind.

Cheers,
Lénárd

On 11 November 2022 20:45:25 GMT, Alexander Bessonov via Std-Proposals <std-proposals_at_[hidden]> wrote:
  In C++23, vector got a range constructor which can be used among with ranges::to to construct a vector from a range.

  If the source range is a random-access range, vector constructor can pre-allocate enough storage before traversing a range. If the source range is an input range, constructor is left with making emplace_back.

  However, things get interesting for forward range. For example, in current implementation of MSVC STL, they treat forward ranges as random-access ranges in the constructor (the same is true for their vector::vector(It begin, It end) constructor). I didn’t check, but other STL libraries probably do the same, or they could go with input range way.

  There are two variants here: we can either call ranges::distance, pre-allocate and then assign elements (as MSVC STL currently does), effectively traversing range twice or we can process the forward range as an input range.

  It is impossible for the constructor to tell if one variant or another would be better, so the choice of double-traversal looks good in general case. However, there might be a lot of cases when range traversal is very expensive and only caller has this knowledge. Even worse, performance gains or loses may vary on a case-by-case basis.

  I have two suggestions and I believe both of them are worth adding to the library, with the first one called to solve this particular problem, while the second one potentially having wider application:

  1. Propose a new vector range constructor:

  template<typename Range>
  vector::vector(from_range_t, Range &&range, size_t estimated_number_elements);

  This constructor calls reserve(estimated_number_elements) and then traverses the range only once, calling emplace_back. This would potentially trade more allocated memory for speed.

  ranges::to is already capable of passing additional arguments to constructors:

  auto d = source_range | views::filter(/* expensive filter here */) | ranges::to<std::vector>(estimated_number_of_elements);

  2. Add new view adaptor views::as_input (similar to C++23 views::as_rvalue and as_const) that “converts” the range to the input range.

  Sincerely,
  Alexander Bessonov
  HHD Software Ltd.

--------------------------------------------------------------------------------

-- 
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2022-11-11 22:17:55