Meeting notes.

Hi all, I am sorry my power died. Phil, I heard at the end that you might still have a few things to do on the paper but it seems largely done.
I like to see if we can get that done for the next call in July, and possibly  vote on it in SG19 to pass it to LEWG.

I like to correct that the next call in July  will be on Stats paper review. The Aug call will be on RL and AD.
So assuming that is possible, I have adjusted the schedule but that too can be changed.

Thanks all.


On Wed, Jun 10, 2020 at 12:50 PM Michael Wong <fraggamuffin@gmail.com> wrote:

SG19 Machine Learning 2 hours
Hi,

Michael Wong is inviting you to a scheduled Zoom meeting.

Topic: SG19 monthly Apr 2020-Oct 2020
Time:  02:00 PM Eastern Time (US and Canada) 18:00 UTC
    Every month on the Second Thu, until Oct 8, 2020, 7 occurrence(s)
    Apr 9, 2020 02:00 PM 18:00 UTC
    May 14, 2020 02:00 PM 18:00 UTC
    Jun 11, 2020 02:00 PM 18:00 UTC
    Jul 9, 2020 02:00 PM 18:00 UTC
    Aug 13, 2020 02:00 PM 18:00 UTC
    Sep 10, 2020 02:00 PM 18:00 UTC
    Oct 8, 2020 02:00 PM 18:00 UTC
    Please download and import the following iCalendar (.ics) files to your
calendar system.
    Monthly:
https://iso.zoom.us/meeting/v50sceqopj4pyLdu5Mx1orYgnZZUj0RNqw/ics?icsToken=98tyKuuhrz0pGtyQs1-CArUqE53ibvG1kmhirrYIsQe0DDJqZQ3MDNdIYoBRAc-B

Join from PC, Mac, Linux, iOS or Android:
https://iso.zoom.us/j/291630853?pwd=WUlKbS9SNFNRa0QyWXRWenlGSDhaQT09
    Password: 339768

Or iPhone one-tap :
    US: +14086380968,,291630853# or +16468769923,,291630853#
Or Telephone:
    Dial(for higher quality, dial a number based on your current location):
        US: +1 408 638 0968 or +1 646 876 9923 or +1 669 900 6833 or +1
253 215 8782 or +1 301 715 8592 or +1 312 626 6799 or +1 346 248 7799
 or 877 853 5247 (Toll Free)
    Meeting ID: 291 630 853
    Password: 339768
    International numbers available: https://iso.zoom.us/u/abhaIjFKLZ

Or Skype for Business (Lync):
    https://iso.zoom.us/skype/291630853

Agenda:

1. Opening and introductions

1.1 Roll call of participants

Michael Wong, Richard Dosselman, Phil Ratzloff, Jorge Silva, Larry Lewis, Kevin Dewessee, Scott McMllan, ANdrew Lumsdaie,  Jesun Firoz, Marco Foco

1.2 Adopt agenda

Yes

1.3 Approve minutes from previous meeting, and approve publishing
 previously approved minutes to ISOCPP.org

1.4 Action items from previous meetings

2. Main issues (125 min)

2.1 General logistics

Meeting plan, focus on one paper per meeting but does not preclude other
paper updates:

    Apr 9, 2020 02:00 PM: stats paper- DONE
    May 14, 2020 02:00 PM: Stats paper replaces Differential calculus  DONE
    Jun 11, 2020 02:00 PM: Graph paper-
    Jul 9, 2020 02:00 PM: Stats paper + Graph paper vote
    Aug 13, 2020 02:00 PM: Differential calculus  + Reinforcement Learning
    Sep 10, 2020 02:00 PM: Graph paper +stats paper
    Oct 8, 2020 02:00 PM: Differential calculus  + Reinforcement Learning

ISO meeting status

No meeting until end of year, this year deep dive on each ML topics
Papers can still move through using the online meetings for EWG, LEWG, though there is no decision made online, just tentative decisions

CPPCON status

Will happen in hybrid form
Phil Submitted proposal for Cppcon

2.2 Paper reviews

2.2.1: ML topics

Larry Lewis Jorge Silva

Reinforcement Learning proposal:

RL within SAS
provide guidance and an API for RL, optimizers
build on top of generic machine learning
independent of vendors, but follow pytorch, tensorflow
dependent of the underlying tensor
core RL algorithms, a ton of research, it is newer then deep learning
some algos will be distributed, get feedback fro mthe group doign tensors and LA
will you need some of the underlying facilities like Supervised and unsupervised ML? Yes
will you base this on certain libraries? we are most familiar with pytorch, tensorflow when it becomes more functional, we will pick that up
will you use GPUs? yes pytorch is transparent in that respect
for deep learning, we don't use pytorch, just for RL, I don't want to be just dependent on pytorch
hard part can be design phase for ISO C++  which can take a lot more time then you think
as C++ moves, we have to adjust the design to match the new style C++20, 23, 23, we started with data structures, and switched to functions for stats
GPU is where performance come from; yes for future we will talk about parallel ranges and SYCL

what about automatic differentiation? for a tensor we do need both GPU and AD - this helps with back propagation
for AD: do you build a network of optimizers?  yes plan to reuse work from other teams; have implemented NN too many times
trying  hard to make AD that works for everybody? library or language
SG7 reflection had a lot of polarization on this topic, one side say it should be language, another side says it should be library; but we need code introspection, and generate from the code
we dont want to standardize entire AST of C++

pytorch differentiates with AD and forward differentiation meant you could not single step forward in the code, but pytorch can do that



Phil Ratsloff et al

P1709R1: Graph Proposal for Machine Learning

P1709R3:
https://docs.google.com/document/d/1kLHhbSTX7j0tPeTYECQFSNx3R35Mu3xO5_dyYdRy4dM/edit?usp=sharing

https://docs.google.com/document/d/1QkfDzGyfNQKs86y053M0YHOLP6frzhTJqzg1Ug_vkkE/edit?usp=sharing
<
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1QkfDzGyfNQKs86y053M0YHOLP6frzhTJqzg1Ug_vkkE%2Fedit%3Fusp%3Dsharing&data=02%7C01%7CPhil.Ratzloff%40sas.com%7C729b2cf8502641e4ae5e08d749064578%7Cb1c14d5c362545b3a430
9552373a0c2f%7C0%7C0%7C637058163592253027&sdata=4UQm8tqrcUbiZsr200UMrOaEModJYGNgP1oNot9PbAg%3D&reserved=0>

I’ve been working on the prototype implementation to get it building in both Windows & Linux, using CMake & the Conan package manager:

  1. All unit tests complete successfully for both MSVC & gcc10
  2. All bgl17 code has been removed from the repository. It uses a cloned bgl17 directory (ENABLE_BGL17 cmake option).
  3. Catch2 is now being used instead of Google Test for unit testing
  4. A simple unit test demonstrates the use of the library’s dfs_vertex_range iteration using bgl17’s vov graph. This can be seen in test/test_vov_adaptor.cpp.
    1. There were a few changes needed in bgl17 to accommodate this (I haven’t pushed these changes)

                                                               i.      I added an inner_container type definition to vov

                                                             ii.      There were 3 places where I added #ifdef _MSC_VER to disable linux-specific code, far fewer than before.

    1. Adapting vov requires the following

                                                               i.      An adaptor graph class to map the vov types to expected types

                                                             ii.      Function overloads that uses the adaptor graph class as a template argument

  1. Added graph API functions to avoid name ambiguity with begin(g) & end(g) for vertices in the dfs & bfs range iterators.
    1. vertex_begin(g), vertex_end(g)
    2. edge_begin(g,u), edge_end(g,u)

 

I haven’t written the code to support value(uv) function to get edge properties for vov yet.

These changes should bring the library much closer to a repeatable cross-platform build and you’re welcome to try it.

I’ve pushed the code to the master branch at https://github.com/pratzl/graph

 

The next SG19 meeting is 6/11/20 (12d from now) and I have some things in mind to work on. I’ve been focused on the prototype to make it more accessible for all the authors and I need to switch back to the paper and give it more attention.

  1. Paper
    1. Complete algorithm descriptions & examples:

                                                               i.      Connected Components

                                                             ii.      Strongly Connected components

                                                           iii.      Bi-connected Components

                                                           iv.      Articulation Points

    1. Data structures

                                                               i.      Add section on graph adaptors

  1. algorithm implementations
    1. connected & strongly connected components unit tests
    2. [bi-connected components]
    3. [articulation points]
  2. bgl17 adaptors
    1. vov adaptor: implement value(edge), add dfs_edge_range tests
    2. implement a compressed adaptor
  3. other prototype features
    1. Support Clang10 using the range-v3 concepts macros
  4. Documentation
    1. Add explicit description of how to install and use the library


90-95% done, major sections there, need examples
prototype email library works on linux and windows
using cmake conan, unit test framework

all algo have iterators and can also take range
output_iterator concept added requires output iterator, might be Richard want to use that

added vertex begin and end  to allow me to want the graph based one

can iterate through graph, have begin and end, with starting vertex,  can also construct one with a range

a section on graph data structure has been rewritten, carried from the beginning, reflect what I have in my prototype
have classes with common template types, 3 types for user values
there is an index type which is either 32 or 16 bit value
default of 32 bit is most of the case
there is the allocator
a section on what kind of user-defined type for weighs for example in an adjacency list

compressed has been changed to direct_adjacency_array to compliments undirected

can define properties for a graph and edge/vertex
only user-defined property can be changed after ts been constructed
other constraint is source edges has to be ordered by vertex key
which is DAA graph, and is a template alias with various defaults
have various classes to implement this
access id defined by public function

possible to customize by overriding

Do I need all the constant types?
Think yes, both const and non-const
also think so, we did that withBGL17 as well, else its annoying
I need to prove it for my self though I also think so
a lot of boilerplate stuff to make all this work; yes can shortcut with enable_if as well

one interesting constructor that takes a range of edges and vertexes, extracts key from edge from edge range, just a pair of vertex keys, another one extract fn property and fn property
now I see I also need a way to specify the graph

need to revisit to see if I need to reimplement this for the test

then there is undirected adjacency list

assert there is one object per edge and is part of 2 linked lists
edges are in doubly linked list, stored in a vector,  after construction can't add vertexes  or edges
inedges are ordered by vertex key

everythign else is similar


finally a section to adapt to external graph, adapt algo here to their data structures
we can define our own graph type to overeride the graph type to do the right thing, but also do that with  types as well
I tested my own BFS algo with BGL17 data structure; yes have small things in there that can allow that to happen starting with a graph so I think this is the right approach

Jesun asked Is it a hard requirement that the vertices will be in a vector and edges in a linked list for undirected graph (and probably for directed graph type)? Any implication on iterating over them in terms of performance as well as mutability of the graph?
Good question, this is something I like to explore; have a key that i want to access in a different way to enable conditional algo, where you dont have that requirement
I like to relax that area more so I have not changed any concepts from what we have before
concept used reflects that algorithm; should still have something less restrictive; yes probably right should be able to do iteration, on neighbors, randomly access  container of forward terable containers


Walking through code TDD style
as I am doing development, it is outputing results
at top of file I set the Test option using the German routes


I like what you did with BGL17 testing  ... yes it was with tuples of ranges

dont have topological sort done yet

interface is stable
strongly connected components not tested

what about initializers? I have a few classes that will generate graph for me but can be improved as it is not repeatable; yes we have file i/o for matrix, graph =(){} convenient thingfor testing
could I do it with what I have now or do I need a constructor but will look at whatBGL17 have too
should be doable but have to go back to see where we put that in things to do:
1. fns I have not implemented
2. BGL17 compressed graph
3. range support sentinels
4. reverse filter for a graph?
creating a NN with weighs is a common thing and eliminates an edge; yes that kind of filter is useful for NN
comparison with other libraries  - put out a separate paper
can we store a graph in some constexpr arrays
marshalling,
relax constraints on algos to make them more flexible

< my power died at this point>


aiming for guidance on moving this paper forward.
















Richard Dosselman et al

P1708R1: Math proposal for Machine Learning
https://docs.google.com/document/d/1VAgcyvL1riMdGz7tQIT9eTtSSfV3CoCEMWKk8GvVuFY/edit

> std.org/jtc1/sc22/wg21/docs/papers/2020/p1708r2
> above is the stats paper that was reviewed in Prague
> http://wiki.edg.com/bin/view/Wg21prague/P1708R2SG19
>
> Review Jolanta Polish feedback.
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2119r0.html



Richard persents
large number of revision now replace iterator pairs with ranges
free standing function presents linear pass over the data
for large data sets, then we have accumulator objects to make one combined pass and compute final stats at the end
alternate predicate, if you want to retrieve one value out of array of structures

for each mean, have overloads
MC suggested we have a weighted mean

Added execution policy for parallelization
gives you 4 variations of each of the means

we also have geometric and harmonic means with 4 flavors each

variance also follows but makes clear working with population vs sample

passing 2 ranges - 1 from value and one from range
can you pass just one  (zipping of 2 ranges together and extracting the projection) if the 2 coincide with each other
OK, I might move in that direction, will think about it

replace median with general quantile

mode has a comparator for equality

python only returns 1st mode, but I will return all the modes

makes one linear pass through each data structure,
but can also allow single pass  to compute it all
using accumulated weights which also have weighted and unweighted version of each of the mean median and mode
this allows one single linear pass for all these data structures

mode can return a series of values and can handle non-numerical data

moving to documentation now

for normal distribution, can you have a parameter that defaults to normal? for a statistician, Poisson distribution, arrival time,
whether a mean is a good moment to calculate, sample mean are good estimators,
continue this on reflector











Differentiable Programing by Marco Foco

P1416R1: SG19 - Linear Algebra for Data Science and Machine Learning
https://docs.google.com/document/d/1IKUNiUhBgRURW-UkspK7fAAyIhfXuMxjk7xKikK4Yp8/edit#heading=h.tj9hitg7dbtr

P1415: Machine Learning Layered list
https://docs.google.com/document/d/1elNFdIXWoetbxjO1OKol_Wj8fyi4Z4hogfj5tLVSj64/edit#heading=h.tj9hitg7dbtr

2.2.2 SG14 Linear Algebra progress:
Different layers of proposal
https://docs.google.com/document/d/1poXfr7mUPovJC9ZQ5SDVM_1Nb6oYAXlK_d0ljdUAtSQ/edit

2.2.3 any other proposal for reviews?

2.3 Other Papers and proposals

2.5 Future F2F meetings:

2.6 future C++ Standard meetings:
https://isocpp.org/std/meetings-and-participation/upcoming-meetings

-2020-02-10 to 15: Prague, Czech Republic

- 2020-06-01 to 06: Bulgaria
- 2020-11: (New York, tentative)
- 2021-02-22 to 27: Kona, HI, USA

3. Any other business

New reflector

http://lists.isocpp.org/mailman/listinfo.cgi/sg19

Old Reflector
https://groups.google.com/a/isocpp.org/forum/#!newtopic/sg19
<https://groups.google.com/a/isocpp.org/forum/?fromgroups=#!forum/sg14>

Code and proposal Staging area

4. Review

4.1 Review and approve resolutions and issues [e.g., changes to SG's
working draft]

4.2 Review action items (5 min)

5. Closing process

5.1 Establish next agenda

TBD

5.2 Future meeting

    Jul 9, 2020 02:00 PM
    Aug 13, 2020 02:00 PM
    Sep 10, 2020 02:00 PM
    Oct 8, 2020 02:00 PM