C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-core] Fwd: New Core Issue: [lex.name]/3.2 under-specifies "uppercase letter"

From: Gabriel Dos Reis <gdr_at_[hidden]>
Date: Mon, 28 Oct 2019 19:01:22 +0000
+1.
________________________________
From: Core <core-bounces_at_[hidden]> on behalf of Billy O'Neal (VC LIBS) via Core <core_at_[hidden]>
Sent: Monday, October 28, 2019 11:59:33 AM
To: core_at_[hidden] <core_at_[hidden]>; JF Bastien <cxx_at_[hidden]>
Cc: Billy O'Neal (VC LIBS) <bion_at_[hidden]>; C++ Core Language Working Group <core_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>; SG16 <unicode_at_[hidden]>; Mathias Stearn <redbeard0531+isocpp_at_[hidden]>; wmm_at_[hidden] <wmm_at_[hidden]>
Subject: Re: [isocpp-core] [SG16-Unicode] Fwd: New Core Issue: [lex.name]/3.2 under-specifies "uppercase letter"


It’s also different in different locales, the classic example being “i" which becomes “Ý” in some locales. See https://en.wikipedia.org/wiki/Dotted_and_dotless_I<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FDotted_and_dotless_I&data=02%7C01%7Cgdr%40microsoft.com%7C80f1864d9701480f3ee408d75bd8fd58%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637078859844962616&sdata=wzIa0XXyulh9NQ6IiIgA4tGswaVCakoJRijRxE8NmwA%3D&reserved=0>



I think limiting it to A-Z is reasonable (as this PR does).



Billy3



From: Corentin via Core<mailto:core_at_[hidden]>
Sent: Monday, October 28, 2019 10:56 AM
To: JF Bastien<mailto:cxx_at_[hidden]>
Cc: Corentin<mailto:corentin.jabot_at_[hidden]>; wmm_at_[hidden]<mailto:wmm_at_[hidden]>; SG16<mailto:unicode_at_[hidden]>; Mathias Stearn<mailto:redbeard0531+isocpp_at_[hidden]>; C++ Core Language Working Group<mailto:core_at_[hidden]>
Subject: Re: [isocpp-core] [SG16-Unicode] Fwd: New Core Issue: [lex.name]/3.2 under-specifies "uppercase letter"



I would like to point out that afaik, although a rare event, the uppercase property of codepoints is not guaranteed to be stable and can change in either way from one Unicode version to the next.



On Mon, Oct 28, 2019, 18:32 JF Bastien <cxx_at_[hidden]<mailto:cxx_at_[hidden]>> wrote:

I’d like to have a stronger motivation that this. Do we ever intend to use non-ascii as reserved names? If so, we should wait to resolve TR31 and not make any change because doing what you propose closes a door. If not (ie we’ll only ever use A-Z to start reserved names) then your change is exactly what we’ll want



On Mon, Oct 28, 2019 at 9:39 AM Mathias Stearn <redbeard0531+isocpp_at_[hidden]<mailto:redbeard0531%2Bisocpp_at_[hidden]>> wrote:

Is it just uppercase letters in the basic source character set, or anything considered an uppercase letter in the universal character set after phase 1 transcoding and universal-character-name resolution? Or is there some other definition of uppercase?



I have a slight preference for restricting to just A-Z so that it doesn't require humans or tools to consult the unicode data tables to decide if an identifier is safe to use.



Proposed resolution:



Replace [lex.names]/3.2 with:



Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase <del>letter</del><ins>nondigit</ins> is reserved to the implementation for any use.





Alternatively we could either create a new grammar production for uppercase nondigits, or just say something like "one of the universal characters in the range 0041-005A (A-Z)"





_______________________________________________
SG16 Unicode mailing list
Unicode_at_[hidden]<mailto:Unicode_at_[hidden]>
http://www.open-std.org/mailman/listinfo/unicode<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.open-std.org%2Fmailman%2Flistinfo%2Funicode&data=02%7C01%7Cgdr%40microsoft.com%7C80f1864d9701480f3ee408d75bd8fd58%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637078859844972608&sdata=mGF13haXSgwvv19LtS44L656GDsW9uCjuik%2F1P1b6vs%3D&reserved=0>

_______________________________________________
SG16 Unicode mailing list
Unicode_at_[hidden]<mailto:Unicode_at_[hidden]>
http://www.open-std.org/mailman/listinfo/unicode<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.open-std.org%2Fmailman%2Flistinfo%2Funicode&data=02%7C01%7Cgdr%40microsoft.com%7C80f1864d9701480f3ee408d75bd8fd58%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637078859844972608&sdata=mGF13haXSgwvv19LtS44L656GDsW9uCjuik%2F1P1b6vs%3D&reserved=0>



Received on 2019-10-28 20:01:29