C++ Logo


Advanced search

Updated D2558 : "Add @, $, and ` to the basic character set"

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 27 Apr 2022 00:38:48 -0400
Uploaded : https://isocpp.org/files/papers/D2558R1.html

New section with implications and consequences,
Please ignore the {add} green below, I've given up fighting between
markdown, html, the paper system and gmail for the evening.

3 Implications and Consequences

Because this proposal is not making these characters available for
syntactic purposes, the changes are limited to how these characters encoded
today, or are represented in source.
3.1 Literal Encoding

Adding these characters to the basic character set means these will have to
be encoded in a single byte, with positive value when used as a char. This
is true for all POSIX encoded character sets, as @, $, and ` are part of
the portable character set. This also implies they are available in all
POSIX locales, and in particular the “POSIX” locale, which is equivalent to
the “C” locale. [POSIX
<https://isocpp.org/files/papers/D2558R1.html#ref-POSIX>] See 6. Character
3.2 Runtime Encoding

A locale that does not provide for these characters would be
non-conforming. Interpreting the literal encoding in any encoded character
set, including the “C” LC_CTYPE character set if it does not match the
literal encoding, is already at best unspecified. Substitution ciphers are
apparently conforming, although misleading. There is a long history of
interpreting the Yen sign, ¥, as a path separator on Windows exactly
because of these encoding aliasing issues.
3.3 Source Encoding and Representation

There is a rule that characters in the basic character set may not be
expressed as UCNs, unless inside a character or sting literal. For C there
are issues for characters in comments. This is not the case for C++. In
non-comment contexts, these characters are currently not allowed in
portable source, so the spelling of the character is irrelevant.

For extensions that allow, for example, $ in identifiers, no one outside of
compiler test suites, is using a UCN to spell that.

This should break no C++ source.

C++ places no constraints on source encoding. The closest we have is the
in-flight requirement that implementations that accept files be required to
accept UTF-8, and UTF-8 encodes these characters.

Received on 2022-04-27 04:39:02