C++ Logo

sg16

Advanced search

Re: [SG16] [isocpp-ext] Virtual evolution meeting on Unicode identifiers, Thursday June 18th @ 10AM Pacific

From: Steve Downey <sdowney_at_[hidden]>
Date: Thu, 18 Jun 2020 11:35:18 -0400
C's machinery is very different. Annex D applies to
universal-character-names and there is no conversion of characters not in
the basic source character set to ucns. From the C working draft - " An
implementation may allow multibyte characters that are not part of the
basic source character set to appear in identifiers; which characters and
their correspondence to universal character names is
implementation-defined." and Annex D: " This clause lists the hexadecimal
code values that are valid in universal character names in identifiers." It
appears that the list in Annex D is the same as currently in C++, and is a
superset of UAX 31 identifiers.

On Thu, Jun 18, 2020 at 9:36 AM Alisdair Meredith via Ext <
ext_at_[hidden]> wrote:

> Late question looking over the paper.
>
> How does the restriction of identifiers to follow the Unicode
> specification compare to the C standard Annex D?
>
> I ask, as I am wondering if we want some more wording
> for C++ Annex C.5 [diff.iso] on which identifiers may (or may
> not) be used in shared code.
>
> AlisdairM
>
> On Jun 8, 2020, at 19:14, JF Bastien via Ext <ext_at_[hidden]> wrote:
>
> Hello Ⓔⓥⓞⓛⓤⓣⓘⓞⓝ,
>
> Next week on Thursday the 18th at 10AM Pacific we'll be discussing Unicode
> identifiers. It was on our "tentatively ready" list as of Prague, but
> received some feedback and has been updated as detailed by Steve below. I'd
> like us to discuss the changes, and tentatively leave it on the tentatively
> ready list, so next time we can make decisions we reaffirm that we're
> forwarding to Core (as per our process <http://wg21.link/p1999>).
>
> Updated paper:
>
> https://isocpp.org/files/papers/P1949R4.html
>
>
> Here's the GitHub issue:
>
> https://github.com/cplusplus/papers/issues/688
>
>
> Meeting information:
>
> Zoom Meeting ID 735059607
> Zoom Meeting Password template
> Zoom Meeting Room
> https://iso.zoom.us/j/735059607?pwd=d2tzRkZrTGY1c241R2prOVIrVnNXdz09
> Zoom Meeting Automatic Phone-In US: +16699006833,,735059607# or
> +14086380968,,735059607#
> Zoom Meeting Phone Number US: +1 669 900 6833
> International numbers available https://iso.zoom.us/u/acPWjSNM0
>
>
> See you then!
>
> JF
>
> p.s. this is valid C++:
>
> int Ⓔⓥⓞⓛⓤⓣⓘⓞⓝ;
>
>
>
> On Fri, Jun 5, 2020 at 1:37 PM Steve Downey via SG16 <
> sg16_at_[hidden]> wrote:
>
>>
>> Last week SG16 (Text) approved forwarding this paper to EWG for
>> consideration. It addresses fixing the state of allowed identifiers in C++.
>>
>> https://isocpp.org/files/papers/P1949R4.html (also attached as
>> d1949.html)
>>
>> Summary <https://isocpp.org/files/papers/D1949R4.html#summary>
>>
>> The allowed Unicode code points in identifiers include many that are
>> unassigned or unnecessary, and others that are actually counter-productive.
>> By adopting the recommendations of UAX #31, Unicode Identifier and Pattern
>> Syntax, C++ will be easier to work with in international environments and
>> less prone to accidental problems.
>>
>> This proposal does not address some potential security concerns—so called
>> homoglyph attacks—where letters that appear the same may be treated as
>> distinct. Methods of defense against such attacks are complex and evolving,
>> and requiring mitigation strategies would impose substantial implementation
>> burden.
>>
>> This proposal also recommends adoption of Unicode normalization form C
>> (NFC) for identifiers to ensure that when compared, identifiers intended to
>> be the same will compare as equal. Legacy encodings are generally naturally
>> in NFC when converted to Unicode. Most tools will, by default, produce NFC
>> text.
>>
>> Some unusual scripts require the use of characters as joiners that are
>> not allowed by UAX #31, these will no longer be available as identifiers in
>> C++.
>>
>> As a side-effect of adopting the identifier characters from UAX #31,
>> using emoji in or as identifiers becomes ill-formed.
>>
>> See also
>> https://unicode.org/reports/tr31/ Unicode® Standard Annex #31 UNICODE
>> IDENTIFIER AND PATTERN SYNTAX
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> _______________________________________________
> Ext mailing list
> Ext_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/ext
> Link to this post: http://lists.isocpp.org/ext/2020/06/14104.php
>
>
> _______________________________________________
> Ext mailing list
> Ext_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/ext
> Link to this post: http://lists.isocpp.org/ext/2020/06/14240.php
>

Received on 2020-06-18 10:41:04