ISOCPP sg16 List: Re: Updated D2558 : "Add @, $, and ` to the basic character set"

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 27 Apr 2022 15:25:10 -0400

Updated with addressing various comments and issues.
https://isocpp.org/files/papers/D2558R1.html

On Wed, Apr 27, 2022 at 12:38 AM Steve Downey <sdowney_at_[hidden]> wrote:

> Uploaded : https://isocpp.org/files/papers/D2558R1.html
>
> New section with implications and consequences,
> Please ignore the {add} green below, I've given up fighting between
> markdown, html, the paper system and gmail for the evening.
>
> 3 Implications and Consequences
> <https://isocpp.org/files/papers/D2558R1.html#implications-and-consequences>
>
> Because this proposal is not making these characters available for
> syntactic purposes, the changes are limited to how these characters encoded
> today, or are represented in source.
> 3.1 Literal Encoding
> <https://isocpp.org/files/papers/D2558R1.html#literal-encoding>
>
> Adding these characters to the basic character set means these will have
> to be encoded in a single byte, with positive value when used as a char.
> This is true for all POSIX encoded character sets, as @, $, and ` are part
> of the portable character set. This also implies they are available in all
> POSIX locales, and in particular the “POSIX” locale, which is equivalent to
> the “C” locale. [POSIX
> <https://isocpp.org/files/papers/D2558R1.html#ref-POSIX>] See 6.
> Character Set
> <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html>
> 3.2 Runtime Encoding
> <https://isocpp.org/files/papers/D2558R1.html#runtime-encoding>
>
> A locale that does not provide for these characters would be
> non-conforming. Interpreting the literal encoding in any encoded character
> set, including the “C” LC_CTYPE character set if it does not match the
> literal encoding, is already at best unspecified. Substitution ciphers are
> apparently conforming, although misleading. There is a long history of
> interpreting the Yen sign, ¥, as a path separator on Windows exactly
> because of these encoding aliasing issues.
> 3.3 Source Encoding and Representation
> <https://isocpp.org/files/papers/D2558R1.html#source-encoding-and-representation>
>
> There is a rule that characters in the basic character set may not be
> expressed as UCNs, unless inside a character or sting literal. For C there
> are issues for characters in comments. This is not the case for C++. In
> non-comment contexts, these characters are currently not allowed in
> portable source, so the spelling of the character is irrelevant.
>
> For extensions that allow, for example, $ in identifiers, no one outside
> of compiler test suites, is using a UCN to spell that.
>
> This should break no C++ source.
>
> C++ places no constraints on source encoding. The closest we have is the
> in-flight requirement that implementations that accept files be required to
> accept UTF-8, and UTF-8 encodes these characters.
>

Received on 2022-04-27 19:25:23