Date: Thu, 25 Feb 2021 09:27:26 +0100
https://wiki.edg.com/pub/Wg21virtual2021-02/SG16/d2314r1.html
In response to yesterday's discussion:
- header-names can now contain any character; they are mapped to files
in an implementation-defined manner, so it's up to the implementation what
it does with a character sequence that looks like a UCN.
- The definition of execution (wide) character set was moved to [character.seq],
including defining what "locale-specific" means.
- Dropped the (recently added) requirement about encoding consistency
between the literal encoding and the execution (runtime) encoding
(reflects existing practice).
New text:
"The execution character set and the execution wide-character set are supersets
of the basic literal character set (5.3 [lex.charset]). The encodings of the
execution character sets and the sets of additional elements (if any) are
locale-specific. [ Note: The encoding of the execution character sets can be
unrelated to any literal encoding. -- end note ]"
- Hubert's observation that code unit semantics was changed has been fixed;
the text now reads
"A literal encoding encodes each element of the basic literal character
set as a single code unit with non-negative value, distinct from the
code unit for any other such element. [ Note: A character not in the
basic literal character set can be encoded with more than one code unit;
the value of such a code unit can be the same as that of a code unit
for an element of the basic literal character set. -- end note ]."
The remaining differences with Corentin's P2297R0 are
- basic literal character set
There are now four uses of the term in my paper, so it seems to be a useful
descriptive tool. (Suggestions to unify "basic character set" and
"basic literal character set" would imply semantic changes to the status
quo or would use more words, I believe.)
- translation character set
We agree there is no difference on the (intended) semantics either way;
I believe this is simply a question of presentation in the standard.
My definition (aligned with ISO 10646 terminology) currently reads:
The translation character set consists of the following elements:
- each character named by ISO/IEC 10646, as identified by its unique UCS scalar value, and
- a distinct character for each UCS scalar value where no named character is assigned.
Jens
In response to yesterday's discussion:
- header-names can now contain any character; they are mapped to files
in an implementation-defined manner, so it's up to the implementation what
it does with a character sequence that looks like a UCN.
- The definition of execution (wide) character set was moved to [character.seq],
including defining what "locale-specific" means.
- Dropped the (recently added) requirement about encoding consistency
between the literal encoding and the execution (runtime) encoding
(reflects existing practice).
New text:
"The execution character set and the execution wide-character set are supersets
of the basic literal character set (5.3 [lex.charset]). The encodings of the
execution character sets and the sets of additional elements (if any) are
locale-specific. [ Note: The encoding of the execution character sets can be
unrelated to any literal encoding. -- end note ]"
- Hubert's observation that code unit semantics was changed has been fixed;
the text now reads
"A literal encoding encodes each element of the basic literal character
set as a single code unit with non-negative value, distinct from the
code unit for any other such element. [ Note: A character not in the
basic literal character set can be encoded with more than one code unit;
the value of such a code unit can be the same as that of a code unit
for an element of the basic literal character set. -- end note ]."
The remaining differences with Corentin's P2297R0 are
- basic literal character set
There are now four uses of the term in my paper, so it seems to be a useful
descriptive tool. (Suggestions to unify "basic character set" and
"basic literal character set" would imply semantic changes to the status
quo or would use more words, I believe.)
- translation character set
We agree there is no difference on the (intended) semantics either way;
I believe this is simply a question of presentation in the standard.
My definition (aligned with ISO 10646 terminology) currently reads:
The translation character set consists of the following elements:
- each character named by ISO/IEC 10646, as identified by its unique UCS scalar value, and
- a distinct character for each UCS scalar value where no named character is assigned.
Jens
Received on 2021-02-25 02:27:31