sg16: Re: [SG16] P2194R0 The character set of C++ source code is Unicode

From: Steve Downey <sdowney_at_[hidden]>
Date: Mon, 24 Aug 2020 13:02:26 -0400

While I agree that UCNs are logically a Unicode Translation Format
(UTF), that is they are an encoding of Unicode code points, I don't
believe we can remove the translation to UCNs without fixing the rest
of the grammar, where we depend on matching UCNs for many productions.
The lexer believes that there are only basic source characters, and
anything outside that is in UCN form.
This should be fixable, since we are not changing any valid C++, but
it will be delicate.

On Mon, Aug 24, 2020 at 8:31 AM Peter Brett via SG16
<sg16_at_[hidden]> wrote:
>
> Hi all,
>
> In this week's meeting, we are going to discuss the remaining
> proposals from P2178R1 "Misc lexing and string handling improvements".
> In particular, we will discuss proposal 9:
>
> Proposal 9: Reaffirming Unicode as the character set of the
> internal representation
>
> In anticipation of a lively discussion, Corentin and I have written a
> short new paper which will be appearing in the September mailing.
>
> P2194R0 The character set of C++ source code is Unicode
> https://isocpp.org/files/papers/P2194R0.pdf
>
> We hope that the study group finds this contribution helpful and
> informative.
>
> Best regards,
>
> Peter
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2020-08-24 12:06:08