Hi all, SG19 Machine Learning meeting will focus on stats.Michael Wong is inviting you to a scheduled Zoom meeting.Topic: SG19 monthlyTime: 02:00 PM Eastern Time (US and Canada) Every month on the Second Thu,Join from PC, Mac, Linux, iOS or Android:https://iso.zoom.us/j/93084591725?pwd=K3QxZjJlcnljaE13ZWU5cTlLNkx0Zz09 Password: 035530Or iPhone one-tap : US: +13017158592,,93084591725# or +13126266799,,93084591725#Or Telephone: Dial(for higher quality, dial a number based on your current location): US: +1 301 715 8592 or +1 312 626 6799 or +1 346 248 7799 or +1408 638 0968 or +1 646 876 9923 or +1 669 900 6833 or +1 253 215 8782 or 877 853 5247 (Toll Free) Meeting ID: 930 8459 1725 Password: 035530 International numbers available: https://iso.zoom.us/u/agewu4X97Or Skype for Business (Lync): https://iso.zoom.us/skype/93084591725Agenda:1. Opening and introductionsThe ISO Code of conduct:https://www.iso.org/files/live/sites/isoorg/files/store/en/PUB100397.pdfIEC Code of Conduct:https://www.iec.ch/basecamp/iec-code-conduct-technical-workISO patent policy.https://isotc.iso.org/livelink/livelink/fetch/2000/2122/3770791/Common_Policy.htm?nodeid=6344764&vernum=-2The WG21 Practices and Procedures and Code of Conduct:https://isocpp.org/std/standing-documents/sd-4-wg21-practices-and-procedures1.1 Roll call of participants
Michael Wong, Richard Dosselman, Ozan Irsoy, Phil Ratzloff, Luke D'Alessandro, Kevin Dewessee, Chris Ryan, Andrew Lumsdaine, Rene Rivera, Ka Ming Chan Jens Maurer
1.2 Adopt agenda
1.3 Approve minutes from previous meeting, and approve publishing
previously approved minutes to ISOCPP.org
1.4 Action items from previous meetings
2. Main issues (125 min)
2.1 General logistics
Meeting plan, focus on one paper per meeting but does not preclude other
paper updates:
May 12, 2022 02:00 PM ET: Stats
June 9, 2022 02:00 PM ET: Graph
Jul 14, 2022 02:00 PM ET: Matrix, RL and DC
Aug 11, 2022 02:00 PM ET: Stats
Sep 13, 2022 02:00 PM ET: Graph
Oct 12, 2022 02:00 PM ET: Matrix RL/DC
ISO meeting status
future C++ Std meetings
Stats SG6:
met with them a month ago
can we fuse the mean_accumulators into one; construct with no parameter and pass a parameter to it to merge weighed and unweighted version together
but weighted version of variance_accumulator does not have the freedom of 1/(n-1) so will need a different constructor; so decided with SG6 that it is better to have separate weighted and unweighted variance_accumulator
merge the weighted and unweighted kurtosis together? this seems More involved.
clear to move on to LEWG
median quantile and mode are different; lets have a look
median and mode could be multiple values
so need sorted range
good feel median and quantile fn will work, SG6 concerned
don't know how many modes there will be
mode can be messy to deal with
user want to bin data into little groups
looking at Boost histogram library
quantile in sorted order, want .25 (25th element percentage wise) . if even array it would be 2 in the middle, then avg? or return both?
user provides sorted range, say which quantile you want, must also give us size
median would be quantile at 0.5,
convenience function to get multiple quantiles, so pass a range of them, and do one linear scan of them
does range already have elements in there? there is a concept sized_range means it has a customization point for size
what is Q? 60% quantile, unconstrained so float or double?
why pair in return value why optional? if element is between 2 then return 2 not just one
std::pair should be smallest struct
look at ranges algorithm
if you dont support ranges that dont meet that concept, then N is appropriate so sized_range could exclude some data type because you might not want to do a scan so may be indicate that through naming
always have to test the optional for the size, so if you just have 2 is simpler? is that confusing? if you have 5 twice
i cant always ignore that optional
can you return a range instead of pair
disagree with alternative fn overload
1 accumulator that brings 4 ; now has memory allocation concerns but it is an output iterator
or 4 separate accumulator that brings back 1
use case: compute several things over same range, and scan a large range only once
if a random access range, then just jump to where data is ;
want optimization over random access case
unfriendly interface do we expect a lot of data to be read only once
select algo on unsorted data is O(N) instead of O(nlogn)
dont remove accumulators yet but the sweet spot may be small
naming with sorted range, not on the return; try various naming schemes: quantiles_of_sorted
how about a tag? usually only for constructors
quantile fn need a constraint must be convertable from value type of range
template parameters have R as first and sometimes last, can we be consistent? so they will go back