C++ Logo


Advanced search

Re: [SG16] Multiple combining characters and P1949R3: C++ Identifier Syntax using Unicode Standard Annex 31

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Wed, 6 May 2020 12:59:58 -0400
On Wed, May 6, 2020 at 6:01 AM Corentin Jabot via SG16 <
sg16_at_[hidden]> wrote:

> We know that while arbitrary concatenation if NFC sequences may not be
> NFC, concatenating NFC identifiers is always NFC because XID_Start
> combining are never combining.
The entire Hangul Jamo block is XID_Start:
1100..1248 ; XID_Start # Lo [329] HANGUL CHOSEONG KIYEOK..ETHIOPIC

The table in the section (https://unicode.org/reports/tr15/#Concatenation)
that you referenced has an example requiring only characters from that
NFC: {U+1100}
NFC: {U+1161}{U+11A8}
Not NFC: {U+1100}{U+1161}{U+11A8}

> [ ... ] Notably there should not be isolated universal-character-name that
> are not well-formed identifiers.
These are technically required for the description of tokenization to be
clear. That's all we use these for. Their presence immediately makes the
program ill-formed (CWG folks would insist that I indicate "unless if UB
exists anyway").

Received on 2020-05-06 12:03:18