Date: Thu, 16 May 2019 11:13:23 -0400
Cameron forwarded the following to me after our meeting yesterday.
C#'s grammar for identifiers is defined at the following link.
Basically, identifiers match UAX#31 with a few additions (most of which
make sense for C++ as well).
-
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/lexical-structure#identifiers
A fun blog post exploring C# Unicode identifiers and the possibility of
character classification changing over time resides at the following link.
-
https://codeblog.jonskeet.uk/2014/12/01/when-is-an-identifier-not-an-identifier-attack-of-the-mongolian-vowel-separator/
Tom.
On 5/15/19 1:39 PM, Tom Honermann wrote:
> Thanks for bringing this to our attention. I agree there are
> opportunities for improvement here. I filed a new SG16 issue to track
> this.
>
> https://github.com/sg16-unicode/sg16/issues/48
>
> I encourage anyone interested in this to sign up to write a paper or
> provide additional background material in the issue (e.g., more
> history about the current list of ranges, an analysis of UAX#31 and
> its applicability to C++, etc...).
>
> Tom.
>
> On 5/10/19 12:43 PM, JF Bastien wrote:
>> Hi C++ પกٱƈѻɗﻉ ḟäṅṡ 👋!
>>
>> The current list of valid identifier characters is pretty silly (see
>> [*lex.name <http://lex.name>*] 5.10 Identifiers or cppreference
>> summary <https://en.cppreference.com/w/cpp/language/identifiers>). It
>> allows characters such as zero-width joiner and zero-width space
>> among a few silly things (see how bad this can get
>> <https://godbolt.org/z/sBJk1k>, h/t Richard Kogelnig).
>>
>> I asked where it came from, and IIUC John looked at Unicode and
>> cobbled the list of valid ranges manually. That ain't great.
>>
>> Is this group interested in fixing things?
>>
>> There's already an existing standard for this, maybe it's a thing we
>> can adopt as-is or use as a starting point:
>>
>> https://unicode.org/reports/tr31/
>>
>>
>> Further, the tooling group was just talking about module names. I
>> think we should allow any valid identifier name as module name, and
>> look at how this could map to file names for a tooling TR's purpose.
>>
>> Thanks,
>>
>> J̙̘̗̘̟͐̀̎F͚̜͈̖͉̗̘̊
>>
>> _______________________________________________
>> SG16 Unicode mailing list
>> Unicode_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/unicode
>
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
C#'s grammar for identifiers is defined at the following link.
Basically, identifiers match UAX#31 with a few additions (most of which
make sense for C++ as well).
-
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/lexical-structure#identifiers
A fun blog post exploring C# Unicode identifiers and the possibility of
character classification changing over time resides at the following link.
-
https://codeblog.jonskeet.uk/2014/12/01/when-is-an-identifier-not-an-identifier-attack-of-the-mongolian-vowel-separator/
Tom.
On 5/15/19 1:39 PM, Tom Honermann wrote:
> Thanks for bringing this to our attention. I agree there are
> opportunities for improvement here. I filed a new SG16 issue to track
> this.
>
> https://github.com/sg16-unicode/sg16/issues/48
>
> I encourage anyone interested in this to sign up to write a paper or
> provide additional background material in the issue (e.g., more
> history about the current list of ranges, an analysis of UAX#31 and
> its applicability to C++, etc...).
>
> Tom.
>
> On 5/10/19 12:43 PM, JF Bastien wrote:
>> Hi C++ પกٱƈѻɗﻉ ḟäṅṡ 👋!
>>
>> The current list of valid identifier characters is pretty silly (see
>> [*lex.name <http://lex.name>*] 5.10 Identifiers or cppreference
>> summary <https://en.cppreference.com/w/cpp/language/identifiers>). It
>> allows characters such as zero-width joiner and zero-width space
>> among a few silly things (see how bad this can get
>> <https://godbolt.org/z/sBJk1k>, h/t Richard Kogelnig).
>>
>> I asked where it came from, and IIUC John looked at Unicode and
>> cobbled the list of valid ranges manually. That ain't great.
>>
>> Is this group interested in fixing things?
>>
>> There's already an existing standard for this, maybe it's a thing we
>> can adopt as-is or use as a starting point:
>>
>> https://unicode.org/reports/tr31/
>>
>>
>> Further, the tooling group was just talking about module names. I
>> think we should allow any valid identifier name as module name, and
>> look at how this could map to file names for a tooling TR's purpose.
>>
>> Thanks,
>>
>> J̙̘̗̘̟͐̀̎F͚̜͈̖͉̗̘̊
>>
>> _______________________________________________
>> SG16 Unicode mailing list
>> Unicode_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/unicode
>
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
Received on 2019-05-16 17:13:26