sg16: Re: [SG16] [isocpp-core] Renaming universal-character-name

From: Tom Honermann <tom_at_[hidden]>
Date: Sun, 1 Mar 2020 13:47:01 -0500

On 3/1/20 3:46 AM, Corentin Jabot wrote:
>
>
> On Sat, 29 Feb 2020 at 23:44, Gabriel Dos Reis <gdr_at_[hidden]
> <mailto:gdr_at_[hidden]>> wrote:
>
> Why?
>
>
> It is not used - for good reason as the only possible use case is
> legacy code, but legacy code is even less likely to use non
> basic-source-character-set codepoints -, it is weirdly opinionated (
> foo\u0065 is ill-formed, foo\u0165 is well formed), and we are talking
> about extend it it?

We don't have means to measure its uses precisely, but regardless, I
don't agree that its only possible use case is in legacy code; it exists
as both a specification and compatibility mechanism for source file
encodings that support characters outside the basic source character set.

The weirdly opinionated prohibition on code points that name members of
the basic source character set exists so that implementations don't have
to distinguish between "class" and "cl\u0061ss" when parsing source
code. That prohibition seems quite reasonable to me.

Tom.

>
> -- Gaby
> ------------------------------------------------------------------------
> *From:* Core <core-bounces_at_[hidden]
> <mailto:core-bounces_at_[hidden]>> on behalf of Corentin
> Jabot via Core <core_at_[hidden] <mailto:core_at_[hidden]>>
> *Sent:* Saturday, February 29, 2020 1:05:45 PM
> *To:* Tony V E <tvaneerd_at_[hidden] <mailto:tvaneerd_at_[hidden]>>
> *Cc:* Corentin Jabot <corentinjabot_at_[hidden]
> <mailto:corentinjabot_at_[hidden]>>; sg16_at_[hidden]
> <mailto:sg16_at_[hidden]> <sg16_at_[hidden]
> <mailto:sg16_at_[hidden]>>; Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>>; C++ Core Language Working Group
> <core_at_[hidden] <mailto:core_at_[hidden]>>
> *Subject:* Re: [isocpp-core] [SG16] Renaming universal-character-name
>
>
> On Sat, Feb 29, 2020, 21:13 Tony V E <tvaneerd_at_[hidden]
> <mailto:tvaneerd_at_[hidden]>> wrote:
>
> I don't anyone using anything but ASCII.
>
> So there's no problem, right?
>
>
> My point is maybe we should deprecate escaped identifiers rather
> than add more.
>
>
> Sent from my BlackBerry portable Babbage Device
> *From: *Corentin Jabot
> *Sent: *Saturday, February 29, 2020 2:06 PM
> *To: *Tony V E
> *Cc: *Richard Smith; C++ Core Language Working Group; Tom
> Honermann; sg16_at_[hidden] <mailto:sg16_at_[hidden]>
> *Subject: *Re: [isocpp-core] [SG16] Renaming
> universal-character-name
>
>
>
>
> On Sat, Feb 29, 2020, 19:34 Tony V E <tvaneerd_at_[hidden]
> <mailto:tvaneerd_at_[hidden]>> wrote:
>
> > Independent of anything else, I think that P1097 should
> allow \N{...} in identifiers, for consistency with \u / \U
> -- I find it very hard to see a reason why the two should
> be permitted in a different set of contexts.
>
> +1
>
> There may only be weak reasoning to allow \N identifiers
> but I think it buys more than disallowing it.
>
>
> int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
>
> vs:
>
> int \u0100 = 1;
>
> vs:
>
> int Ā = 1;
>
>
> At least the \N one gives me a hint what is going on.
>
> If Ā was part of an external function of a library and I
> needed to call it, but can't type Ā, the \N form‎ gives a
> hint. (I would probably hide it behind an inline function
> either way:
>
> inline Bar AWithMacron‎(Foo foo)
> {
> ‎return \N{LATIN CAPITAL LETTER A WITH MACRON}(foo);
> }
>
>
> Do we know of people using \u in identifiers?
>
>
>
> Sent from my BlackBerry portable Babbage Device
> *From: *Tom Honermann via Core
> *Sent: *Thursday, February 27, 2020 12:01 PM
> *To: *Richard Smith; C++ Core Language Working Group
> *Reply To: *core_at_[hidden]
> <mailto:core_at_[hidden]>
> *Cc: *Tom Honermann; sg16_at_[hidden]
> <mailto:sg16_at_[hidden]>; Corentin Jabot
> *Subject: *Re: [isocpp-core] [SG16] Renaming
> universal-character-name
>
>
> SG16 happened to be meeting and discussing this topic
> concurrently with Richard's email. I'll have minutes
> posted to
> https://github.com/sg16-unicode/sg16-meetings#february-26th-2020
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsg16-unicode%2Fsg16-meetings%23february-26th-2020&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650645947&sdata=5gCdMhv2TsGYHH%2BDAjrme4r1Zc3JS0wnr5SXFnn2zQE%3D&reserved=0>
> in the next couple of days.
>
> In that meeting, we had general consensus (we didn't poll)
> for renaming /universal-character-name/ to
> /unicode-code-point/ while keeping /named-escape-sequence/
> as is with direction that I update P2071
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwg21.link%2Fp2071&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650655943&sdata=Rh6M8PjxGr%2BNYc4sY4prbhLNPhCH9RvQsgvzSAJ1dJw%3D&reserved=0>
> (the successor to P1097
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwg21.link%2Fp1097&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650655943&sdata=E3XJSo1h%2BLtoAsKLd6ts%2Bj2CtE23oCRVDq1fzqDCnMQ%3D&reserved=0>)
> to provide editorial direction for the rename.
>
> I don't have strong opinions on the rename. I was under
> the impression that /universal-character-name/ was
> introduced in C11/C++11, but I see I was mistaken as it is
> present in C++98. In retrospect, I don't know why I had
> that impression.
>
> The fact that this term has been around since C99 and
> C++98 does give me pause. I'll refrain from proposing the
> rename in P2071 pending further discussion.
>
> With regard to allowing \N{...} in identifiers, P2071R0
> does mention the possibility of such allowance as a future
> extension
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwg21.link%2Fp2071%23future&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650655943&sdata=6TIBZAtlLziY0xj2xS0ALOHZnuT%2Bp%2BFUhWQXSWqli%2FU%3D&reserved=0>,
> but without discussion. I'll update the paper to discuss
> this. I recall some discussions about allowing these
> escapes in identifiers, but I don't think those
> discussions were in minuted meetings and it hasn't been
> polled in SG16 or EWG(I). Personally, I don't see
> sufficient motivation for allowing one to type:
>
> int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
>
> vs:
>
> int \u0100 = 1;
>
> precisely because there is little motivation to be able to
> type the latter one. In my mind, use of
> /universal-character-name/ escapes outside of literals
> exists as a mechanism to support source character
> encodings that support characters outside the basic source
> character set. Virtually all programmers are going to type
> the following instead:
>
> int Ā = 1;
>
> Motivation to be able to type the form containing the
> /universal-character-name/ exists so that identifiers that
> can't otherwise be represented in the source encoding of a
> particular source file can still be represented. I'm not
> sure that motivation extends to being able to type the
> /named-escape-sequence/ variant.
>
> Tom.
>
> On 2/26/20 4:25 PM, Richard Smith wrote:
>> Well, "universal character name" / UCN is established
>> terminology in C and C++ that has been around for more
>> than 20 years, and does not appear to be used for any
>> other purpose. If we rename it, a lot of reference
>> material (for example) will need to be updated. Given
>> that, it's unclear to me that renaming it will be a net
>> improvement, although removing any possible confusion
>> with the "na" property of the character would certainly
>> be a good thing. Also, a UCN is not a codepoint per se --
>> rather, it is a specific syntax for referring to (naming)
>> a codepoint -- and "character codepoint" seems a bit
>> redundant. If we're going to rename it, something like
>> /unicode-escape-sequence/ would seem more fitting.
>>
>> Independent of anything else, I think that P1097 should
>> allow \N{...} in identifiers, for consistency with \u /
>> \U -- I find it very hard to see a reason why the two
>> should be permitted in a different set of contexts. And
>> if we do that, then we can just add productions to the
>> existing /universal-character-name/ nonterminal, and not
>> need to rename anything. So I think renaming the grammar
>> production is at least premature. If we do it at all, it
>> should be done by P1097.
>>
>> On Wed, Feb 26, 2020 at 12:35 PM Tom Honermann via Core
>> <core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
>>
>> On 2/26/20 5:15 AM, Corentin Jabot via SG16 wrote:
>>> +sg16
>>>
>>> On Wed, 26 Feb 2020 at 11:12, Corentin
>>> <corentin.jabot_at_[hidden]
>>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>>>
>>> Hello,
>>> To use terminology more aligned with Unicode and
>>> to avoid confusion with character names - which
>>> are for example used by P1097R2 - Named
>>> character escapes, I would like to rename
>>> mechanically *universal-character-name* to
>>> *universal-character-codepoint*
>>>
>> I was tempted to do this as part of P2029 (Proposed
>> resolution for core issues 411, 1656, and 2333;
>> numeric and universal character escapes in character
>> and string literals), but decided it was too much to
>> bite off as part of that effort. I do think a rename
>> is in order.
>>
>> Tom.
>>
>>> Is that something coerce would be willing to do?
>>> If so, what would be the best way to do it?
>>> Paper targeting core?
>>>
>>> Regards,
>>>
>>> Corentin
>>>
>>>
>>
>> _______________________________________________
>> Core mailing list
>> Core_at_[hidden] <mailto:Core_at_[hidden]>
>> Subscription:
>> https://lists.isocpp.org/mailman/listinfo.cgi/core
>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Fcore&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650665939&sdata=MgBuvV4jhOImsvAOcapfuE%2Ffevmb0mRY7cYYXTkqan8%3D&reserved=0>
>> Link to this post:
>> http://lists.isocpp.org/core/2020/02/8561.php
>> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Fcore%2F2020%2F02%2F8561.php&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650665939&sdata=xmpOOv%2BSgUfqxOPDMp0c35uvsO3ozgktbhUT9uJK8qE%3D&reserved=0>
>>
>
>
>

Received on 2020-03-01 12:49:48