On Thu, Feb 25, 2021 at 9:27 AM Jens Maurer via SG16 <sg16@lists.isocpp.org> wrote:


In response to yesterday's discussion:

 - header-names can now contain any character; they are mapped to files
in an implementation-defined manner, so it's up to the implementation what
it does with a character sequence that looks like a UCN.


 - The definition of execution (wide) character set was moved to [character.seq],
including defining what "locale-specific" means.

 - Dropped the (recently added) requirement about encoding consistency
between the literal encoding and the execution (runtime) encoding
(reflects existing practice).

New text:

"The execution character set and the execution wide-character set are supersets
of the basic literal character set (5.3 [lex.charset]). The encodings of the
execution character sets and the sets of additional elements (if any) are
locale-specific. [ Note: The encoding of the execution character sets can be
unrelated to any literal encoding. -- end note ]"

Would you consider removing the note until we get time to see if that is exactly the case?
I think I may have found a way to be a bit more exact than the note(but need time to think about it and I don't think we need to resolve it for this paper)- everything else is fine by me!

 - Hubert's observation that code unit semantics was changed has been fixed;
the text now reads

"A literal encoding encodes each element of the basic literal character
set as a single code unit with non-negative value, distinct from the
code unit for any other such element. [ Note: A character not in the
basic literal character set can be encoded with more than one code unit;
the value of such a code unit can be the same as that of a code unit
for an element of the basic literal character set. -- end note ]."

A literal encoding encodes each character as a distinct [Note: or state shifted] sequence
of code units. Elements of the basic (literal) character set are encoded as a single code unit with non-negative value

The remaining differences with Corentin's P2297R0 are

 - basic literal character set

There are now four uses of the term in my paper, so it seems to be a useful
descriptive tool.  (Suggestions to unify "basic character set" and
"basic literal character set" would imply semantic changes to the status
quo or would use more words, I believe.)

 - translation character set

We agree there is no difference on the (intended) semantics either way;
I believe this is simply a question of presentation in the standard.
My definition (aligned with ISO 10646 terminology) currently reads:

The translation character set consists of the following elements:

 - each character named by ISO/IEC 10646, as identified by its unique UCS scalar value, and
 - a distinct character for each UCS scalar value where no named character is assigned.

SG16 mailing list