In response to yesterday's discussion:
- header-names can now contain any character; they are mapped to files
in an implementation-defined manner, so it's up to the implementation what
it does with a character sequence that looks like a UCN.
- The definition of execution (wide) character set was moved to [character.seq],
including defining what "locale-specific" means.
- Dropped the (recently added) requirement about encoding consistency
between the literal encoding and the execution (runtime) encoding
(reflects existing practice).
"The execution character set and the execution wide-character set are supersets
of the basic literal character set (5.3 [lex.charset]). The encodings of the
execution character sets and the sets of additional elements (if any) are
locale-specific. [ Note: The encoding of the execution character sets can be
unrelated to any literal encoding. -- end note ]"
Would you consider removing the note until we get time to see if that is exactly the case?
I think I may have found a way to be a bit more exact than the note(but need time to think about it and I don't think we need to resolve it for this paper)- everything else is fine by me!
- Hubert's observation that code unit semantics was changed has been fixed;
the text now reads
"A literal encoding encodes each element of the basic literal character
set as a single code unit with non-negative value, distinct from the
code unit for any other such element. [ Note: A character not in the
basic literal character set can be encoded with more than one code unit;
the value of such a code unit can be the same as that of a code unit
for an element of the basic literal character set. -- end note ]."
A literal encoding encodes each character as a distinct [Note: or state shifted] sequence
of code units. Elements of the basic (literal) character set are encoded as a single code unit with non-negative value
The remaining differences with Corentin's P2297R0 are
- basic literal character set
There are now four uses of the term in my paper, so it seems to be a useful
descriptive tool. (Suggestions to unify "basic character set" and
"basic literal character set" would imply semantic changes to the status
quo or would use more words, I believe.)
- translation character set
We agree there is no difference on the (intended) semantics either way;
I believe this is simply a question of presentation in the standard.
My definition (aligned with ISO 10646 terminology) currently reads:
The translation character set consists of the following elements:
- each character named by ISO/IEC 10646, as identified by its unique UCS scalar value, and
- a distinct character for each UCS scalar value where no named character is assigned.
SG16 mailing list