## SG19 | |

**Subject:** Comments on Math proposal for Machine Learning / Simple Statistical Functions

**From:** Benjamin Poulain (*bpoulain_at_[hidden]*)

**Date:** 2019-09-13 20:01:39

**Next message:**Michael Wong: "Sg19 monthly Telecon today"**Previous message:**Michael Wong: "Re: Sept 12 SG19 Zoom call"

Hi,

I am unable to make the calls but I have a couple of comments on the draft "Simple Statistical Functions" (https://docs.google.com/document/d/1VAgcyvL1riMdGz7tQIT9eTtSSfV3CoCEMWKk8GvVuFY/edit#heading=h.9ogkehmdmtel).

1. Would it be useful to explicitly make the order of operations undefined?

For example, the result of mean() on floating point numbers depends on the order in which numbers are added. Ideally, implementations should be allowed to take advantage of parallelism (e.g. SIMD) which would affect the result.

2. The definition of Standard Deviation has a specific formula for its computation.

I believe it would be good if that is not normative. An implementation may want to use the E[x^2] - E[x]^2 variant instead of the E[(x - E[x])^2].

3. In the definition of â€œMedianâ€, I donâ€™t think the text is clear about what is returned if the range has an odd numbers of elements. I suppose it would be std::pair(middle, middle).

4. The current definition of â€œmodeâ€ takes a sorted range.

I believe it would be useful to also have a version that take an arbitrary range with hashable values. The alternative would be to use std::sort() followed by std::mode() which would be worse.

5. Standard Deviation & Variance: The definition does not specify what happens if (first == last).

I hope this helps.

Benjamin

**Next message:**Michael Wong: "Sg19 monthly Telecon today"**Previous message:**Michael Wong: "Re: Sept 12 SG19 Zoom call"

SG19 list run by herb.sutter at gmail.com