Subject: Re: P2194R0 The character set of C++ source code is Unicode
From: Alisdair Meredith (alisdairm_at_[hidden])
Date: 2020-08-24 11:54:15
Looking into the wording more deeply, you are going to have to address
the notion of characters in identifiers, if you are no longer mapping into
UCNs, so we can no longer rely on non-digit handling that mapping for
This is probably best addressed in terms of
rather than trying for a different fix that will roll into that later.
Note that I was looking into a much more limited requirement to merely
requiore support for UTF-8 encoded source files as a well-speficied
mapping, rather than switching to unicode as the replacement for the
basic source character set. Once P1945 is adopted, I will have an
easier time buying into this paper, but in isolation is raises too many
tricky questions for me about unicode text in surprising places. I
suspect we will need a deeper review of the grammar around identiiers
and other tokens to flush all the issues out - which is largely what P1945
> On Aug 24, 2020, at 12:32, Peter Brett <pbrett_at_[hidden]> wrote:
> Hi Alisdair,
> Thank you for the feedback. That's a very good suggestion, thank you. It ties into the suggested change to processing of UCNs that we've discussed a few times.
> When you have a u8"" literal, the associated literal encoding is UTF-8. When you have a 'plain' "" string literal, the associated literal encoding is implementation-defined.
> Best regards,
>> -----Original Message-----
>> From: Alisdair Meredith <alisdairm_at_[hidden] <mailto:alisdairm_at_[hidden]>>
>> Sent: 24 August 2020 17:29
>> To: SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>>
>> Cc: Peter Brett <pbrett_at_[hidden] <mailto:pbrett_at_[hidden]>>; Corentin <corentin.jabot_at_[hidden] <mailto:corentin.jabot_at_[hidden]>>
>> Subject: Re: [SG16] P2194R0 The character set of C++ source code is Unicode
>> EXTERNAL MAIL
>> Minor suggestion on the wording,
>> You strike the mapping of non-basic source code characters to
>> universal-character-name, including the cross-reference to such
>> mappings reverting in raw string literals (5.4). I suggest making
>> a matching edit to strike the reference in (5.4)p3 as well, so that
>> the only thing reverted is line splicing in phase 2.
>> That said, with these changes, I am curious what the difference
>> is between a u8 string literal and a plain âcharâ string literal, as
>> the contents of that literal are now going to be unicode source
>> Text (rather than requesting a mapping from source to unicode
>> of literalâs contents)?
>>> On Aug 24, 2020, at 08:31, Peter Brett via SG16 <sg16_at_[hidden]>
>>> Hi all,
>>> In this week's meeting, we are going to discuss the remaining
>>> proposals from P2178R1 "Misc lexing and string handling improvements".
>>> In particular, we will discuss proposal 9:
>>> Proposal 9: Reaffirming Unicode as the character set of the
>>> internal representation
>>> In anticipation of a lively discussion, Corentin and I have written a
>>> short new paper which will be appearing in the September mailing.
>>> P2194R0 The character set of C++ source code is Unicode
>> https://urldefense.com/v3/__https://isocpp.org/files/papers/P2194R0.pdf__ <https://urldefense.com/v3/__https://isocpp.org/files/papers/P2194R0.pdf__>;!!
>>> We hope that the study group finds this contribution helpful and
>>> Best regards,
>>> SG16 mailing list
>>> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
>> https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg <https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg>
SG16 list run by firstname.lastname@example.org