C++ Logo

sg16

Advanced search

Re: [SG16] Multiple combining characters and P1949R3: C++ Identifier Syntax using Unicode Standard Annex 31

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Tue, 5 May 2020 15:55:32 -0400
On Tue, May 5, 2020 at 3:17 PM Tom Honermann via SG16 <sg16_at_[hidden]>
wrote:

> Agreed for that example. But for the other example I provided, the
> resulting identifier (if lexed such that \u0300\u0327 produces a single
> preprocessor token) is in NFC since there is no precomposed character for a
> capital letter A with grave and cedilla.
>
According to the implementation provided by
https://minaret.info/test/normalize.msp (and also GCC), the NFC form is
\u00c0\u0327.


> Do we believe that that example should be well-formed?
>
The rationale for not allowing stray combining characters is that they may
semantically combine, in terms of a text application, with characters from
the basic source character set in a way that disagrees with how C++
tokenization works. This is why the wording makes the program ill-formed
when the \u0300 is encountered right after forming it as a preprocessor
token.

Received on 2020-05-05 14:59:39