Thank you, joined.

On Mon, May 3, 2021 at 6:59 PM Michael Wong <michael@codeplay.com> wrote:

Hi Oleksandr, thank you for the implementation feedback. You can also join the SG19 discussion directly as it is open to all:

You can join SG19 here:

https://lists.isocpp.org/mailman/listinfo.cgi/sg19

The next call will be on May 13. Once you join the mailing list, you will receive the invite and we should have a chance to discuss it directly.

Cheers
-- 
Mit freundlichen Grüßen/ With best regards/ 祝一切顺利 / Yoroshiku onegai-shimasu / Cordialement / Gracias / Большое Спасибо
Distinguished Engineer
Industry Expert
Director, VP, ISOCPP.org
ISO C++ Directions Group Chair
Chair of Khronos SYCL C++ Heterogeneous Programming Language
Chair of WG21 C++ SG19: Machine Learning
Chair of WG21 C++ SG14: Low-latency/Games Dev/Financial/Trading/Embedded/Simulation
Editor SG5 Transactional Memory TS
Editor SG1 Concurrency TS
Chair of Standards Council Canada SC22 Programming Languages
Chair of Standards Council Canada TC22/SC32 Electrical and electronic components (SOTIF)

Chair of UL4600 Object Tracking

http://wongmichael.com/about

Michael Wong

Codeplay Software Ltd

Level C Argyle House, 3 Lady Lawson Street,

Edinburgh, EH3 9DR

Tel: 0131 466 0503

Fax: 0131 557 6600

Website: http://www.codeplay.com

Twitter: https://twitter.com/codeplaysoft

This email and any attachments may contain confidential and /or privileged information and is for use by the addressee only. If you are not the intended recipient, please notify Codeplay Software Ltd immediately and delete the message from your computer. You may not copy or forward it, or use or disclose its contents to any other person. Any views or other information in this message which do not relate to our business are not authorized by Codeplay software Ltd, nor does this message form part of any contract unless so stated. As internet communications are capable of data corruption Codeplay Software Ltd does not accept any responsibility for any changes made to this message after it was sent. Please note that Codeplay Software Ltd does not accept any liability or responsibility for viruses and it is your responsibility to scan any attachments. Company registered in England and Wales, number: 04567874 Registered office: 81 Linkfield Street, Redhill RH1 6BY

From: dosselmr@cs.uregina.ca
Sent: Monday, May 3, 2021 11:45 AM
To: Oleksandr Koval
Cc: chiu@cs.toronto.edu; Larry.Lewis@sas.com; Jens.Maurer@gmx.net; eniebler@fb.com; phil.ratzloff@sas.com; vreverdy@illinois.edu; Michael Wong; sg19@lists.isocpp.org
Subject: Re: P1708R4 free functions feedback

Morning.
Thanks for your very detailed feedback. I will have a better look at
your comments when I have an available moment.
Thanks,
Richard
Quoting Oleksandr Koval <oleksandr.koval.dev@gmail.com>:

> Hi, I needed similar stats functions in my project so I tried to implement
> proposed free functions from P1708R4. I faced several
> problems/typos/questions, maybe my experience can be useful for the future
> development of this proposal.
> General things
>
>    1.
>
>    Since C++20 constraint for ExecutionPolicy algorithms has been updated to
>     std::is_execution_policy_v<std::remove_cvref_t<ExecutionPolicy>> instead
>    of std::is_execution_policy_v<std::decay_t<ExecutionPolicy>>.
>    2.
>
>    Shouldn't stats_error have const char*/const
> std::string&/string_view constructor
>    to report what?
>    3.
>
>    I believe that various(if any) throwing checks are redundant. Range size
>    checks are cheap only for random access iterators. No existing algorithm
>    throws when range(s) has bad size. The common case here is when values and
>    weights ranges have different sizes, I think we can just leave it UB.
>    Another one is when range has 0 or < N elements. In case of mean() we
>    can just divide 0/0 and get inf for empty range. No exceptions are
>    needed. Range value checks are expensive, since we're dealing
> with floating
>    point types there's no problem with meaningless operations like
> square root
>    of negative value.
>    4.
>
>    Use std::conditional_t in function signatures instead of
>    conditional<>::type.
>
>
> Iterator kind requirements
>
>    1.
>
>    All existing algorithms with ExecutionPolicy support require forward
>    iterator, not input iterator.
>    2.
>
>    Review iterator requirements. For some functions it's very hard if
>    possible at all to support input iterator. For example, unweighted
> sorted_quantile(r,
>    q, proj, interpolation). All it needs is the size of the range to select
>    proper element(s). For input iterator it needs to iterate over all
>    elements just to determine range size, but it's not possible to select
>    proper element(s) afterward because iterators are not valid anymore. The
>    only option I can see here is to copy the whole range into vector, such
>    an algorithm has O(n) space complexity.
>
> Mean
>
>    1. I think it would be nice to provide some overflow guarantees to
>    encourage implementations to avoid it. For example, use the recurrence
>    relation formula for mean(): mean(n) = mean(n-1) + (data[n] -
>    mean(n-1))/(n+1). Maybe it's better to provide specific formulas for
>    each function instead of general ones in the Part 3 of this proposal.
>    Another option is to specify under what conditions overflow is allowed.
>
> Quantile
>
>    1.
>
>    Make interpolation_t a scoped enum and remove trailing _t from
>    enumerator names.
>    2.
>
>    Typo, it still returns tuple instead of Result although interpolation is
>    used.
>    3.
>
>    Probably a typo, since interpolation is used, constraint for elements
>    should be updated. Here's the fixed version of (1):
>
> template<typename R, typename P = std::identity, typename Q,
>     // result_t is just an alias for `conditional_t<...>`
>     typename Result = result_t<R, P>>// `stats_frange` is like
> `stats_range` but requires forward_iteratorrequires stats_frange<R, P>
> && std::is_floating_point_v<Q>constexpr auto sorted_quantile(R&& r, Q
> q, P proj = {},
>     interpolation_t interpolation = interpolation_t::linear) -> Result;
>
>
>    1.
>
>    Consider returning iterators instead of potential interpolation of pure
>    values. Interpolation with existing modes(higher, lower, etc.) can be
>    provided as a separate function. For example, std::max_element() is also
>    supposed to be a numeric function, but it returns an iterator.
> One approach
>    for this is to return std::pair<std::pair<range_iter, q_value>,
>    std::pair<range_iter, q_value>> for unweighted versions and
> std::pair<std::pair<range_iter,
>    weight_sum>, std::pair<range_iter, weight_sum>> for weighted ones. p.first
>    == end means *no answer*, p.first == p.second means *single*
> answer, p.first
>    != p.second means *interpolation* is needed.
>    2.
>
>    Related to issue 6. How to write ExecutionPolicy version for an
>    unweighted sorted quantile/median algorithm? All it does is finds range
>    size and then reads one or two values, there are no parallel versions for
>     std::ranges::distance() and std::ranges::advance().
>    3.
>
>    Maybe it's obvious but specify that weighted quantiles can be >= 1.
>    Because in section 3.2 there is 0 <= q <= 1 requirement.
>    4.
>
>    Clarify unsorted_ versions. One option to implement them is in terms of
>    nth_element() which mutates the original sequence. Another option is to
>    use additional vector of indices and sort them, but it requires O(n)
>    space.
>    5.
>
>    What's the purpose of typename T = double in unweighted versions? It can
>    be just typename T.
>    6.
>
>    Why is there a version sorted_quantile(r, q, interpolation) without
>    projection? I think that wherever we have any range, we should have
>    projection for it as well, and versions without projections should be
>    completely removed. The same for sorted_quantiles() et al., there should
>    be projections for values, quantiles and weights ranges.
>    7.
>
>    How about unsorted_quantiles()? When the number of quantiles is small,
>    it can be more efficient than sorting and using sorted_ versions.
>    8.
>
>    Clarify weighted sorted_ algorithms. In particular, the case when
>    weights are zeros. For example, weighted median for weights
> {1,0,1,0} should
>    be interpolation of 1st and 3rd elements. Clarify what should be done in
>    the case when weight of the first element is already more than target
>    quantile. I suppose that in such a case there's nothing to interpolate and
>    the first element should be returned. Think about the complexity of the
>    unsorted version of this approach. It might require up to N-1 additional
>    calls to nth_element() in case when a lot of elements have zero weight.
>
> I haven't implemented the following functions, hence, not so much I can say
> for them, only minor/interface issues.
> Mode
>
>    1.
>
>    Returning through the output iterator seems extremely inconvenient. What
>    if the client needs only one mode or at most 3 modes? Maybe it will be
>    better to provide 3 kinds of functions: return through output iterator,
>    return by value, return through output range. In the last case, function
>    can put more modes while range is not full.
>    2.
>
>    Why does it require equality_comparable<iter_value<R>> if we need to
>    compare projected<R,P>? Seems that the idea was to store iter_value in
>    the hash map but it can be very inefficient. Why does it require
> std::output_iterator<O,
>    std::iter_value_t<R>? Why does it output non-projected values? I think
>    that it should either operate/output on projected values or on iterators
>    (not pointed values!) to non-projected values. The latter will require at
>    least forward iterator. The same applies to weighted version, why require
>     std::equality_comparable<std::iter_value_t<Weights>>?
>    3.
>
>    Why does (2, 4) require exactly std::unordered_map. Is it possible to
>    allow a custom hash map and specify what operations it needs to support?
>
> Skewness
>
>    1. Make data_t a scoped enumeration. Choose a better name for it, remove
>    trailing _t from enumerator names.
>
> Kurtosis
>
>    1. Make excess param a scoped enumeration instead of bool.
>
> Variance
>
>    1. Is var() really better than variance()?
>
>
> --
> Regards,
> Oleksandr Koval.
>

-----------------------------------------------------
This e-mail was sent via the Secure Web Server at the
University of Regina, Department of Computer Science
          https://www.cs.uregina.ca/WebMail/

Regards,

Oleksandr Koval.