C++ Logo


Advanced search

[SG19] Comments on Math proposal for Machine Learning / Simple Statistical Functions

From: Benjamin Poulain <bpoulain_at_[hidden]>
Date: Fri, 13 Sep 2019 18:01:39 -0700

I am unable to make the calls but I have a couple of comments on the draft "Simple Statistical Functions" (https://docs.google.com/document/d/1VAgcyvL1riMdGz7tQIT9eTtSSfV3CoCEMWKk8GvVuFY/edit#heading=h.9ogkehmdmtel).

1. Would it be useful to explicitly make the order of operations undefined?
    For example, the result of mean() on floating point numbers depends on the order in which numbers are added. Ideally, implementations should be allowed to take advantage of parallelism (e.g. SIMD) which would affect the result.
2. The definition of Standard Deviation has a specific formula for its computation.
    I believe it would be good if that is not normative. An implementation may want to use the E[x^2] - E[x]^2 variant instead of the E[(x - E[x])^2].
3. In the definition of “Median”, I don’t think the text is clear about what is returned if the range has an odd numbers of elements. I suppose it would be std::pair(middle, middle).
4. The current definition of “mode” takes a sorted range.
    I believe it would be useful to also have a version that take an arbitrary range with hashable values. The alternative would be to use std::sort() followed by std::mode() which would be worse.
5. Standard Deviation & Variance: The definition does not specify what happens if (first == last).

I hope this helps.


Received on 2019-09-13 20:05:16