On 5/27/20 3:12 PM, Tom Honermann via SG16 wrote:
On 5/27/20 1:55 PM, Steve Downey via SG16 wrote:
Most of the work for detecting non-normalized NFC is lookup of codepoints in a fairly small table. For the case of identifiers, it looks like some additional optimization may be possible, as many characters are already excluded. 
Adding
Detection of un-normalized text is fairly straight-forward, and GCC 10 already produces a warning. Unicode Annex 15, Unicode Normalization Forms,  provides a quick check algorithm to test if a a string is in one of the normalization forms, driven by tables in the unicode database. See [Detecting_Normalization_Forms](https://unicode.org/reports/tr15/#Detecting_Normalization_Forms) in [@UAX15]. The tables are available at [DerivedNormalizationProps.txt](http://www.unicode.org/Public/UCD/latest/ucd/DerivedNormalizationProps.txt). The check algorithm will sometimes need to normalize short ranges of text where detection of YES or NO is not possible for the single codepoint.

The added "Detecting_Normalization_Forms" link doesn't work for me; a local href is generated.

Actually, I think that only happens with the github rendered preview.  The last rendered version sent to the mailing list links properly.  Ignore.

Tom.



On Wed, May 27, 2020 at 1:33 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
On 5/27/20 1:03 PM, Zach Laine via SG16 wrote:
> On Wed, May 27, 2020 at 12:01 PM Jens Maurer via SG16
> <sg16@lists.isocpp.org> wrote:
>> On 26/05/2020 22.51, Steve Downey via SG16 wrote:
>>> Find attached a draft of the UAX31 paper for discussion.
>>> Viewable at http://htmlpreview.github.io/?https://github.com/steve-downey/papers/blob/master/generated/p1949.html
>>> Source at https://github.com/steve-downey/papers/blob/master/p1949.md
>> I had asked earlier for some prose-text statement on the difficulty
>> of checking NFC.
>>
>> I can only find
>>
>> "Detection of un-normalized text is fairly straight-forward, and GCC 10 already produces a warning. Normalizing to NFC is not much more difficult."
>>
>> which is lacking a bit of depth.
>>
>> What exactly do I have to do to check for NFC?  Check some bits in the code points?
>> Consult some Unicode tables?  Something else?
> You have to look up each adjacent pair of code points in a table, and
> verify that they form a valid NFC sequence.

I think Jens' point is that the paper doesn't state that (and that it
should).

Tom.

>
> Zach


--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16