On Tue, 9 Jun 2020 at 16:48, Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

This is your friendly reminder that an SG16 telecon will be held tomorrow, Wednesday June 10th, at 19:30 UTC (timezone conversion).  To attend, visit https://bluejeans.com/140274541 at the start of the meeting.

The agenda for the meeting is:

Anticipated decisions to be made at this meeting include:

  • Prioritization of terminology updates to pursue.

Prior to tomorrow's meeting, please:

  • review P1859R0, particularly the proposed terminology.
  • think of other terminology changes to be considered.
  • think of how we can divide up the work for making terminology updates. 
Hey!
Some feedback on P1859 after a first attempt at rewording the standard.

I will start to say that it seems entirely reasonable and useful to rewrite [lex] in terms of this new
terminology, and I think that trying to split that work would end up being counter productive ( however the library wording, which has its own definitions, could be reworded
independently). It is not that much work and I'm willing to do that work.

I found that I needed to use the following terms as defined by the Unicode Standard

* abstract character
* character set
* character encoding 
* code units, codepoint

(we can bikeshed codepoint vs scalar values in the grammar as UCNs are technically scalar values) 

The notion of character repertoire was not useful, that of character set is sufficient.

The notion of basic source character set could be removed, instead describing lexing after phase 1 entirely in terms of Unicode - a couple of library functions would have to be reworded, as well as a note in the description of user defined literals as they use "basic source character set" as a proxy to describe something else.

In particular, it is useful to separate entirely the notions of source encoding (which only exists in phase 1), internal representation, and literals encodings, there are 3 distinct and unrelated categories of character sets and encodings, which should have no relation to each other, beyond the existence of an uni-directional mapping from source to internal and internal to literal, so i think it would be valuable not to describe them in term of each other.

It is useful to be able to talk about the Unicode character set rather than "the character set described in ISO/IEC 10646"
The U+xxxx notation (+ unicode character names) is also useful to describe specific codepoints in the grammar.

Similarly, the basic execution character set is not a very useful notion as it is only used as a mechanism to describe which
characters are in the execution and execution wide character sets)
While I didn't try to do it, I think it make sense to rename execution character set in something like narrow/wide literal character sets, in the vein of what P1859 proposes.

It is useful to be able to talk about both literal encoding and literal character sets for each type of literal (a given encoding implicitly represents a character set).

The notion of dynamic encoding proposed by P1859 and its relation to the literal encoding are not needed in lex and might be better described in library, although a note in lex might not hurt

While I have not done that work yet, it seems useful to describe in the grammar in terms of unicode codepoints what constitutes a whitespace as well as a a new line 

With the exception of "character literal" (and "abstract character" ) it seems valuable to systematically replace the use of the vacuous term "character" in the core wording.
That might be slightly more involved in library as "character" is used all over the place, usually to mean "code unit"

The pdf attached is meant to be illustrative of the scope of changes in the core wording, and also contain a number of design changes that are
mostly out of scope of the terminology discussion (It is also full of bugs). These design change will appear in a paper in more details soon™

It notably incorporates changes from P2029 which go a long way in improving the way character literals are described. 
  
Hope that helps, 
Corentin
 
Tom.
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16