On Mon, Oct 28, 2019 at 9:39 AM Mathias Stearn via Core <core@lists.isocpp.org> wrote:

Is it just uppercase letters in the basic source character set, or anything considered an uppercase letter in the universal character set after phase 1 transcoding and universal-character-name resolution? Or is there some other definition of uppercase?

My interpretation:

* We don't resolve universal-character-names; rather, we *form* them. (Eg, int façade; is converted into int fa\u00e7ade;) So for example _Ç becomes _\u00c7, which doesn't start with an underscore followed by an uppercase letter (it's an underscore followed by a slash).

* Unicode (to which we have a normative reference) defines uppercase, and we follow that, but we happen to only ever apply it to the basic source character set because of the above rewriting.

I have a slight preference for restricting to just A-Z so that it doesn't require humans or tools to consult the unicode data tables to decide if an identifier is safe to use.

Regardless of how we express the rule, I agree with this direction.

Proposed resolution:

Replace [lex.names]/3.2 with:

Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase <del>letter</del><ins>nondigit</ins> is reserved to the implementation for any use.

... and I think this is a fine wording improvement, whether or not we think it's formally necessary.

Alternatively we could either create a new grammar production for uppercase nondigits, or just say something like "one of the universal characters in the range 0041-005A (A-Z)"

_______________________________________________
Core mailing list
Core@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
Link to this post: http://lists.isocpp.org/core/2019/10/7541.php