Hi Steve,
Thank you for these updates.
I had been hoping that something would be added to the paper regarding consequences for
h-char-sequence and q-char-sequence in #include directives.
Best wishes,
Peter
From: SG16 <sg16-bounces@lists.isocpp.org>
On Behalf Of Steve Downey via SG16
Sent: 27 April 2022 05:39
To: SG16 <sg16@lists.isocpp.org>
Cc: Steve Downey <sdowney@gmail.com>
Subject: [SG16] Updated D2558 : "Add @, $, and ` to the basic character set"
EXTERNAL MAIL
Uploaded : https://isocpp.org/files/papers/D2558R1.html
New section with implications and consequences,
Please ignore the {add} green below, I've given up fighting between markdown, html, the paper system and gmail for the evening.
Because this proposal is not making these characters available for syntactic purposes, the changes are limited to how these
characters encoded today, or are represented in source.
Adding these characters to the basic character set means these will have to be encoded in a single byte, with positive
value when used as a char. This is true for all POSIX encoded character sets, as @, $, and ` are part of the
portable character set. This also implies they are available in all POSIX locales, and in particular the “POSIX” locale, which is equivalent to the “C” locale. [POSIX] See 6.
Character Set
A locale that does not provide for these characters would be non-conforming. Interpreting the literal encoding in any encoded
character set, including the “C” LC_CTYPE character set if it does not match the literal encoding, is already at best unspecified. Substitution ciphers are apparently conforming, although misleading. There is a long history of interpreting the Yen sign, ¥,
as a path separator on Windows exactly because of these encoding aliasing issues.
There is a rule that characters in the basic character set may not be expressed as UCNs, unless inside a character or sting
literal. For C there are issues for characters in comments. This is not the case for C++. In non-comment contexts, these characters are currently not allowed in portable source, so the spelling of the character is irrelevant.
For extensions that allow, for example, $ in identifiers, no one outside of compiler test suites, is using a UCN to spell
that.
This should break no C++ source.
C++ places no constraints on source encoding. The closest we have is the in-flight requirement that implementations that
accept files be required to accept UTF-8, and UTF-8 encodes these characters.