Date: Wed, 6 May 2020 12:59:58 -0400
On Wed, May 6, 2020 at 6:01 AM Corentin Jabot via SG16 <
sg16_at_[hidden]> wrote:
> We know that while arbitrary concatenation if NFC sequences may not be
> NFC, concatenating NFC identifiers is always NFC because XID_Start
> combining are never combining.
>
The entire Hangul Jamo block is XID_Start:
1100..1248 ; XID_Start # Lo [329] HANGUL CHOSEONG KIYEOK..ETHIOPIC
SYLLABLE QWA
The table in the section (https://unicode.org/reports/tr15/#Concatenation)
that you referenced has an example requiring only characters from that
block:
NFC: {U+1100}
NFC: {U+1161}{U+11A8}
Not NFC: {U+1100}{U+1161}{U+11A8}
> [ ... ] Notably there should not be isolated universal-character-name that
> are not well-formed identifiers.
>
These are technically required for the description of tokenization to be
clear. That's all we use these for. Their presence immediately makes the
program ill-formed (CWG folks would insist that I indicate "unless if UB
exists anyway").
sg16_at_[hidden]> wrote:
> We know that while arbitrary concatenation if NFC sequences may not be
> NFC, concatenating NFC identifiers is always NFC because XID_Start
> combining are never combining.
>
The entire Hangul Jamo block is XID_Start:
1100..1248 ; XID_Start # Lo [329] HANGUL CHOSEONG KIYEOK..ETHIOPIC
SYLLABLE QWA
The table in the section (https://unicode.org/reports/tr15/#Concatenation)
that you referenced has an example requiring only characters from that
block:
NFC: {U+1100}
NFC: {U+1161}{U+11A8}
Not NFC: {U+1100}{U+1161}{U+11A8}
> [ ... ] Notably there should not be isolated universal-character-name that
> are not well-formed identifiers.
>
These are technically required for the description of tokenization to be
clear. That's all we use these for. Their presence immediately makes the
program ill-formed (CWG folks would insist that I indicate "unless if UB
exists anyway").
Received on 2020-05-06 12:03:18