C++ Logo


Advanced search

Re: Updated D2558 : "Add @, $, and ` to the basic character set"

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 27 Apr 2022 15:23:23 -0400
 Header names <https://isocpp.org/files/papers/D2558R1.html#header-names>

The grammar productions for header names uses the translation character
set. It is conditionally supported with implementation defined semantics if
 is allowed, from which we can infer that universal character names are
conditionally supported. If anyone was using UCNs to represent the new
characters in a header, implementations could continue to interpret them,
despite the rule of UCNs not being a valid representation of characters in
the basic character set.

Footnote 14 from [lex.header]

Thus, a sequence of characters that resembles an escape sequence can result
in an error, be interpreted as the character corresponding to the escape
sequence, or have a completely different meaning, depending on the

On Wed, Apr 27, 2022 at 3:55 AM Peter Brett <pbrett_at_[hidden]> wrote:

> Hi Steve,
> Thank you for these updates.
> I had been hoping that something would be added to the paper regarding
> consequences for *h-char-sequence* and *q-char-sequence* in #include
> directives.
> Best wishes,
> Peter
> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Steve Downey
> via SG16
> *Sent:* 27 April 2022 05:39
> *To:* SG16 <sg16_at_[hidden]>
> *Cc:* Steve Downey <sdowney_at_[hidden]>
> *Subject:* [SG16] Updated D2558 : "Add @, $, and ` to the basic character
> set"
> Uploaded : https://isocpp.org/files/papers/D2558R1.html
> <https://urldefense.com/v3/__https:/isocpp.org/files/papers/D2558R1.html__;!!EHscmS1ygiU1lA!Ht8mcD9YQCnZM5EPNz2IS9cPkamJp3tkl2HxrYgTzrAFwgNdAw7HoD26mlZ6Qs6c3m4yahUK4RQ2kdc$>
> New section with implications and consequences,
> Please ignore the {add} green below, I've given up fighting between
> markdown, html, the paper system and gmail for the evening.
> 3 Implications and Consequences
> Because this proposal is not making these characters available for
> syntactic purposes, the changes are limited to how these characters encoded
> today, or are represented in source.
> 3.1 Literal Encoding
> Adding these characters to the basic character set means these will have
> to be encoded in a single byte, with positive value when used as a char.
> This is true for all POSIX encoded character sets, as @, $, and ` are part
> of the portable character set. This also implies they are available in all
> POSIX locales, and in particular the “POSIX” locale, which is equivalent to
> the “C” locale. [POSIX
> <https://urldefense.com/v3/__https:/isocpp.org/files/papers/D2558R1.html*ref-POSIX__;Iw!!EHscmS1ygiU1lA!Ht8mcD9YQCnZM5EPNz2IS9cPkamJp3tkl2HxrYgTzrAFwgNdAw7HoD26mlZ6Qs6c3m4yahUKdBQf2d4$>
> ] See 6. Character Set
> <https://urldefense.com/v3/__https:/pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html__;!!EHscmS1ygiU1lA!Ht8mcD9YQCnZM5EPNz2IS9cPkamJp3tkl2HxrYgTzrAFwgNdAw7HoD26mlZ6Qs6c3m4yahUK3JUgOTY$>
> 3.2 Runtime Encoding
> A locale that does not provide for these characters would be
> non-conforming. Interpreting the literal encoding in any encoded character
> set, including the “C” LC_CTYPE character set if it does not match the
> literal encoding, is already at best unspecified. Substitution ciphers are
> apparently conforming, although misleading. There is a long history of
> interpreting the Yen sign, ¥, as a path separator on Windows exactly
> because of these encoding aliasing issues.
> 3.3 Source Encoding and Representation
> There is a rule that characters in the basic character set may not be
> expressed as UCNs, unless inside a character or sting literal. For C there
> are issues for characters in comments. This is not the case for C++. In
> non-comment contexts, these characters are currently not allowed in
> portable source, so the spelling of the character is irrelevant.
> For extensions that allow, for example, $ in identifiers, no one outside
> of compiler test suites, is using a UCN to spell that.
> This should break no C++ source.
> C++ places no constraints on source encoding. The closest we have is the
> in-flight requirement that implementations that accept files be required to
> accept UTF-8, and UTF-8 encodes these characters.

Received on 2022-04-27 19:23:36