sg19: Re: [SG19] SG19 June 11 monthly call

From: Michael Wong <fraggamuffin_at_[hidden]>
Date: Tue, 16 Jun 2020 00:14:14 -0400

Meeting notes.

Hi all, I am sorry my power died. Phil, I heard at the end that you might
still have a few things to do on the paper but it seems largely done.
I like to see if we can get that done for the next call in July, and
possibly vote on it in SG19 to pass it to LEWG.

I like to correct that the next call in July will be on Stats paper
review. The Aug call will be on RL and AD.
So assuming that is possible, I have adjusted the schedule but that too can
be changed.

Thanks all.

On Wed, Jun 10, 2020 at 12:50 PM Michael Wong <fraggamuffin_at_[hidden]>
wrote:

> SG19 Machine Learning 2 hours
> Hi,
>
> Michael Wong is inviting you to a scheduled Zoom meeting.
>
> Topic: SG19 monthly Apr 2020-Oct 2020
> Time: 02:00 PM Eastern Time (US and Canada) 18:00 UTC
> Every month on the Second Thu, until Oct 8, 2020, 7 occurrence(s)
> Apr 9, 2020 02:00 PM 18:00 UTC
> May 14, 2020 02:00 PM 18:00 UTC
> Jun 11, 2020 02:00 PM 18:00 UTC
> Jul 9, 2020 02:00 PM 18:00 UTC
> Aug 13, 2020 02:00 PM 18:00 UTC
> Sep 10, 2020 02:00 PM 18:00 UTC
> Oct 8, 2020 02:00 PM 18:00 UTC
> Please download and import the following iCalendar (.ics) files to
> your
> calendar system.
> Monthly:
>
> https://iso.zoom.us/meeting/v50sceqopj4pyLdu5Mx1orYgnZZUj0RNqw/ics?icsToken=98tyKuuhrz0pGtyQs1-CArUqE53ibvG1kmhirrYIsQe0DDJqZQ3MDNdIYoBRAc-B
>
> Join from PC, Mac, Linux, iOS or Android:
> https://iso.zoom.us/j/291630853?pwd=WUlKbS9SNFNRa0QyWXRWenlGSDhaQT09
> Password: 339768
>
> Or iPhone one-tap :
> US: +14086380968,,291630853# or +16468769923,,291630853#
> Or Telephone:
> Dial(for higher quality, dial a number based on your current
> location):
> US: +1 408 638 0968 or +1 646 876 9923 or +1 669 900 6833 or +1
> 253 215 8782 or +1 301 715 8592 or +1 312 626 6799 or +1 346 248 7799
> or 877 853 5247 (Toll Free)
> Meeting ID: 291 630 853
> Password: 339768
> International numbers available: https://iso.zoom.us/u/abhaIjFKLZ
>
> Or Skype for Business (Lync):
> https://iso.zoom.us/skype/291630853
>
> Agenda:
>
> 1. Opening and introductions
>
> 1.1 Roll call of participants
>
Michael Wong, Richard Dosselman, Phil Ratzloff, Jorge Silva, Larry Lewis,
Kevin Dewessee, Scott McMllan, ANdrew Lumsdaie, Jesun Firoz, Marco Foco

> 1.2 Adopt agenda
>
Yes

> 1.3 Approve minutes from previous meeting, and approve publishing
> previously approved minutes to ISOCPP.org
>
> 1.4 Action items from previous meetings
>
> 2. Main issues (125 min)
>
> 2.1 General logistics
>
> Meeting plan, focus on one paper per meeting but does not preclude other
> paper updates:
>
> Apr 9, 2020 02:00 PM: stats paper- DONE
> May 14, 2020 02:00 PM: Stats paper replaces Differential calculus DONE
> Jun 11, 2020 02:00 PM: Graph paper-
> Jul 9, 2020 02:00 PM: Stats paper + Graph paper vote
> Aug 13, 2020 02:00 PM: Differential calculus + Reinforcement Learning
> Sep 10, 2020 02:00 PM: Graph paper +stats paper
> Oct 8, 2020 02:00 PM: Differential calculus + Reinforcement Learning
>
> ISO meeting status
>
No meeting until end of year, this year deep dive on each ML topics
Papers can still move through using the online meetings for EWG, LEWG,
though there is no decision made online, just tentative decisions

> CPPCON status
>
Will happen in hybrid form
Phil Submitted proposal for Cppcon

> 2.2 Paper reviews
>
> 2.2.1: ML topics
>
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2119r0.html> Larry
> Lewis Jorge Silva
>
> Reinforcement Learning proposal:
>
RL within SAS
provide guidance and an API for RL, optimizers
build on top of generic machine learning
independent of vendors, but follow pytorch, tensorflow
dependent of the underlying tensor
core RL algorithms, a ton of research, it is newer then deep learning
some algos will be distributed, get feedback fro mthe group doign tensors
and LA
will you need some of the underlying facilities like Supervised and
unsupervised ML? Yes
will you base this on certain libraries? we are most familiar with pytorch,
tensorflow when it becomes more functional, we will pick that up
will you use GPUs? yes pytorch is transparent in that respect
for deep learning, we don't use pytorch, just for RL, I don't want to be
just dependent on pytorch
hard part can be design phase for ISO C++ which can take a lot more time
then you think
as C++ moves, we have to adjust the design to match the new style C++20,
23, 23, we started with data structures, and switched to functions for stats
GPU is where performance come from; yes for future we will talk about
parallel ranges and SYCL

what about automatic differentiation? for a tensor we do need both GPU and
AD - this helps with back propagation
for AD: do you build a network of optimizers? yes plan to reuse work from
other teams; have implemented NN too many times
trying hard to make AD that works for everybody? library or language
SG7 reflection had a lot of polarization on this topic, one side say it
should be language, another side says it should be library; but we need
code introspection, and generate from the code
we dont want to standardize entire AST of C++

pytorch differentiates with AD and forward differentiation meant you could
not single step forward in the code, but pytorch can do that

> Phil Ratsloff et al
>
> P1709R1: Graph Proposal for Machine Learning
>
> P1709R3:
>
> https://docs.google.com/document/d/1kLHhbSTX7j0tPeTYECQFSNx3R35Mu3xO5_dyYdRy4dM/edit?usp=sharing
>
>
> https://docs.google.com/document/d/1QkfDzGyfNQKs86y053M0YHOLP6frzhTJqzg1Ug_vkkE/edit?usp=sharing
> <
>
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1QkfDzGyfNQKs86y053M0YHOLP6frzhTJqzg1Ug_vkkE%2Fedit%3Fusp%3Dsharing&data=02%7C01%7CPhil.Ratzloff%40sas.com%7C729b2cf8502641e4ae5e08d749064578%7Cb1c14d5c362545b3a430
> 9552373a0c2f%7C0%7C0%7C637058163592253027&sdata=4UQm8tqrcUbiZsr200UMrOaEModJYGNgP1oNot9PbAg%3D&reserved=0>
>
>
> I’ve been working on the prototype implementation to get it building in
> both Windows & Linux, using CMake & the Conan package manager:
>
> 1. All unit tests complete successfully for both MSVC & gcc10
> 2. All bgl17 code has been removed from the repository. It uses a
> cloned bgl17 directory (ENABLE_BGL17 cmake option).
> 3. Catch2 is now being used instead of Google Test for unit testing
> 4. A simple unit test demonstrates the use of the library’s
> *dfs_vertex_range* iteration *using bgl17’s vov graph*. This can be
> seen in test/test_vov_adaptor.cpp.
> 1. There were a few changes needed in bgl17 to accommodate this (I
> haven’t pushed these changes)
>
> i. I
> added an inner_container type definition to vov
>
> ii. There
> were 3 places where I added #ifdef _*MSC*_VER to disable linux-specific
> code, far fewer than before.
>
> 1. Adapting vov requires the following
>
> i. An
> adaptor graph class to map the vov types to expected types
>
> ii. Function
> overloads that uses the adaptor graph class as a template argument
>
> 1. Added graph API functions to avoid name ambiguity with begin(g) &
> end(g) for vertices in the dfs & bfs range iterators.
> 1. vertex_begin(g), vertex_end(g)
> 2. edge_begin(g,u), edge_end(g,u)
>
>
>
> I haven’t written the code to support value(uv) function to get edge
> properties for vov yet.
>
> These changes should bring the library much closer to a repeatable
> cross-platform build and you’re welcome to try it.
>
> I’ve pushed the code to the master branch at
> https://github.com/pratzl/graph
>
>
>
> The next SG19 meeting is 6/11/20 (12d from now) and I have some things in
> mind to work on. I’ve been focused on the prototype to make it more
> accessible for all the authors and I need to switch back to the paper and
> give it more attention.
>
> 1. Paper
> 1. Complete algorithm descriptions & examples:
>
> i. Connected
> Components
>
> ii. Strongly
> Connected components
>
> iii. Bi-connected
> Components
>
> iv. Articulation
> Points
>
> 1. Data structures
>
> i. Add
> section on graph adaptors
>
> 1. algorithm implementations
> 1. connected & strongly connected components unit tests
> 2. [bi-connected components]
> 3. [articulation points]
> 2. bgl17 adaptors
> 1. vov adaptor: implement value(edge), add dfs_edge_range tests
> 2. implement a compressed adaptor
> 3. other prototype features
> 1. Support Clang10 using the range-v3 concepts macros
> 4. Documentation
> 1. Add explicit description of how to install and use the library
>
>
> 90-95% done, major sections there, need examples
prototype email library works on linux and windows
using cmake conan, unit test framework

all algo have iterators and can also take range
output_iterator concept added requires output iterator, might be Richard
want to use that

added vertex begin and end to allow me to want the graph based one

can iterate through graph, have begin and end, with starting vertex, can
also construct one with a range

a section on graph data structure has been rewritten, carried from the
beginning, reflect what I have in my prototype
have classes with common template types, 3 types for user values
there is an index type which is either 32 or 16 bit value
default of 32 bit is most of the case
there is the allocator
a section on what kind of user-defined type for weighs for example in an
adjacency list

compressed has been changed to direct_adjacency_array to compliments
undirected

can define properties for a graph and edge/vertex
only user-defined property can be changed after ts been constructed
other constraint is source edges has to be ordered by vertex key
which is DAA graph, and is a template alias with various defaults
have various classes to implement this
access id defined by public function

possible to customize by overriding

Do I need all the constant types?
Think yes, both const and non-const
also think so, we did that withBGL17 as well, else its annoying
I need to prove it for my self though I also think so
a lot of boilerplate stuff to make all this work; yes can shortcut with
enable_if as well

one interesting constructor that takes a range of edges and vertexes,
extracts key from edge from edge range, just a pair of vertex keys, another
one extract fn property and fn property
now I see I also need a way to specify the graph

need to revisit to see if I need to reimplement this for the test

then there is undirected adjacency list

assert there is one object per edge and is part of 2 linked lists
edges are in doubly linked list, stored in a vector, after construction
can't add vertexes or edges
inedges are ordered by vertex key

everythign else is similar

finally a section to adapt to external graph, adapt algo here to their data
structures
we can define our own graph type to overeride the graph type to do the
right thing, but also do that with types as well
I tested my own BFS algo with BGL17 data structure; yes have small things
in there that can allow that to happen starting with a graph so I think
this is the right approach

Jesun asked Is it a hard requirement that the vertices will be in a vector
and edges in a linked list for undirected graph (and probably for directed
graph type)? Any implication on iterating over them in terms of performance
as well as mutability of the graph?
Good question, this is something I like to explore; have a key that i want
to access in a different way to enable conditional algo, where you dont
have that requirement
I like to relax that area more so I have not changed any concepts from what
we have before
concept used reflects that algorithm; should still have something less
restrictive; yes probably right should be able to do iteration, on
neighbors, randomly access container of forward terable containers

Walking through code TDD style
as I am doing development, it is outputing results
at top of file I set the Test option using the German routes

I like what you did with BGL17 testing ... yes it was with tuples of ranges

dont have topological sort done yet

interface is stable
strongly connected components not tested

what about initializers? I have a few classes that will generate graph for
me but can be improved as it is not repeatable; yes we have file i/o for
matrix, graph =(){} convenient thingfor testing
could I do it with what I have now or do I need a constructor but will look
at whatBGL17 have too
should be doable but have to go back to see where we put that in things to
do:
1. fns I have not implemented
2. BGL17 compressed graph
3. range support sentinels
4. reverse filter for a graph?
creating a NN with weighs is a common thing and eliminates an edge; yes
that kind of filter is useful for NN
comparison with other libraries - put out a separate paper
can we store a graph in some constexpr arrays
marshalling,
relax constraints on algos to make them more flexible

< my power died at this point>

aiming for guidance on moving this paper forward.

Richard Dosselman et al
>
> P1708R1: Math proposal for Machine Learning
>
> https://docs.google.com/document/d/1VAgcyvL1riMdGz7tQIT9eTtSSfV3CoCEMWKk8GvVuFY/edit
>
> > std.org/jtc1/sc22/wg21/docs/papers/2020/p1708r2
> > above is the stats paper that was reviewed in Prague
> > http://wiki.edg.com/bin/view/Wg21prague/P1708R2SG19
> >
> > Review Jolanta Polish feedback.
> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2119r0.html
>
>
>
> Richard persents
large number of revision now replace iterator pairs with ranges
free standing function presents linear pass over the data
for large data sets, then we have accumulator objects to make one combined
pass and compute final stats at the end
alternate predicate, if you want to retrieve one value out of array of
structures

for each mean, have overloads
MC suggested we have a weighted mean

Added execution policy for parallelization
gives you 4 variations of each of the means

we also have geometric and harmonic means with 4 flavors each

variance also follows but makes clear working with population vs sample

passing 2 ranges - 1 from value and one from range
can you pass just one (zipping of 2 ranges together and extracting the
projection) if the 2 coincide with each other
OK, I might move in that direction, will think about it

replace median with general quantile

mode has a comparator for equality

python only returns 1st mode, but I will return all the modes

makes one linear pass through each data structure,
but can also allow single pass to compute it all
using accumulated weights which also have weighted and unweighted version
of each of the mean median and mode
this allows one single linear pass for all these data structures

mode can return a series of values and can handle non-numerical data

moving to documentation now

for normal distribution, can you have a parameter that defaults to normal?
for a statistician, Poisson distribution, arrival time,
whether a mean is a good moment to calculate, sample mean are good
estimators,
continue this on reflector

> Differentiable Programing by Marco Foco
>
> P1416R1: SG19 - Linear Algebra for Data Science and Machine Learning
>
> https://docs.google.com/document/d/1IKUNiUhBgRURW-UkspK7fAAyIhfXuMxjk7xKikK4Yp8/edit#heading=h.tj9hitg7dbtr
>
> P1415: Machine Learning Layered list
>
> https://docs.google.com/document/d/1elNFdIXWoetbxjO1OKol_Wj8fyi4Z4hogfj5tLVSj64/edit#heading=h.tj9hitg7dbtr
>
> 2.2.2 SG14 Linear Algebra progress:
> Different layers of proposal
>
> https://docs.google.com/document/d/1poXfr7mUPovJC9ZQ5SDVM_1Nb6oYAXlK_d0ljdUAtSQ/edit
>
> 2.2.3 any other proposal for reviews?
>
> 2.3 Other Papers and proposals
>
> 2.5 Future F2F meetings:
>
> 2.6 future C++ Standard meetings:
> https://isocpp.org/std/meetings-and-participation/upcoming-meetings
>
> -2020-02-10 to 15: Prague, Czech Republic
>
> - 2020-06-01 to 06: Bulgaria
> - 2020-11: (New York, tentative)
> - 2021-02-22 to 27: Kona, HI, USA
>
> 3. Any other business
>
> New reflector
>
> http://lists.isocpp.org/mailman/listinfo.cgi/sg19
>
> Old Reflector
> https://groups.google.com/a/isocpp.org/forum/#!newtopic/sg19
> <https://groups.google.com/a/isocpp.org/forum/?fromgroups=#!forum/sg14>
>
> Code and proposal Staging area
>
> 4. Review
>
> 4.1 Review and approve resolutions and issues [e.g., changes to SG's
> working draft]
>
> 4.2 Review action items (5 min)
>
> 5. Closing process
>
> 5.1 Establish next agenda
>
> TBD
>
> 5.2 Future meeting
>
> Jul 9, 2020 02:00 PM
> Aug 13, 2020 02:00 PM
> Sep 10, 2020 02:00 PM
> Oct 8, 2020 02:00 PM
>

Received on 2020-06-15 23:17:41