C++ Logo

sg16

Advanced search

Re: [SG16] [isocpp-core] Renaming universal-character-name

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Sat, 29 Feb 2020 22:05:45 +0100
On Sat, Feb 29, 2020, 21:13 Tony V E <tvaneerd_at_[hidden]> wrote:

> I don't anyone using anything but ASCII.
>
> So there's no problem, right?
>

My point is maybe we should deprecate escaped identifiers rather than add
more.

>
> Sent from my BlackBerry portable Babbage Device
> *From: *Corentin Jabot
> *Sent: *Saturday, February 29, 2020 2:06 PM
> *To: *Tony V E
> *Cc: *Richard Smith; C++ Core Language Working Group; Tom Honermann;
> sg16_at_[hidden]
> *Subject: *Re: [isocpp-core] [SG16] Renaming universal-character-name
>
>
>
> On Sat, Feb 29, 2020, 19:34 Tony V E <tvaneerd_at_[hidden]> wrote:
>
>> > Independent of anything else, I think that P1097 should allow \N{...}
>> in identifiers, for consistency with \u / \U -- I find it very hard to see
>> a reason why the two should be permitted in a different set of contexts.
>>
>> +1
>>
>> There may only be weak reasoning to allow \N identifiers but I think it
>> buys more than disallowing it.
>>
>>
>> int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
>>
>> vs:
>>
>> int \u0100 = 1;
>>
>> vs:
>>
>> int Ā = 1;
>>
>>
>> At least the \N one gives me a hint what is going on.
>>
>> If Ā was part of an external function of a library and I needed to call
>> it, but can't type Ā, the \N form‎ gives a hint. (I would probably hide
>> it behind an inline function either way:
>>
>> inline Bar AWithMacron‎(Foo foo)
>> {
>> ‎return \N{LATIN CAPITAL LETTER A WITH MACRON}(foo);
>> }
>>
>
> Do we know of people using \u in identifiers?
>
>>
>>
>> Sent from my BlackBerry portable Babbage Device
>> *From: *Tom Honermann via Core
>> *Sent: *Thursday, February 27, 2020 12:01 PM
>> *To: *Richard Smith; C++ Core Language Working Group
>> *Reply To: *core_at_[hidden]
>> *Cc: *Tom Honermann; sg16_at_[hidden]; Corentin Jabot
>> *Subject: *Re: [isocpp-core] [SG16] Renaming universal-character-name
>>
>> SG16 happened to be meeting and discussing this topic concurrently with
>> Richard's email. I'll have minutes posted to
>> https://github.com/sg16-unicode/sg16-meetings#february-26th-2020 in the
>> next couple of days.
>>
>> In that meeting, we had general consensus (we didn't poll) for renaming
>> *universal-character-name* to *unicode-code-point* while keeping
>> *named-escape-sequence* as is with direction that I update P2071
>> <https://wg21.link/p2071> (the successor to P1097
>> <https://wg21.link/p1097>) to provide editorial direction for the rename.
>>
>> I don't have strong opinions on the rename. I was under the impression
>> that *universal-character-name* was introduced in C11/C++11, but I see I
>> was mistaken as it is present in C++98. In retrospect, I don't know why I
>> had that impression.
>>
>> The fact that this term has been around since C99 and C++98 does give me
>> pause. I'll refrain from proposing the rename in P2071 pending further
>> discussion.
>>
>> With regard to allowing \N{...} in identifiers, P2071R0 does mention the
>> possibility of such allowance as a future extension
>> <https://wg21.link/p2071#future>, but without discussion. I'll update
>> the paper to discuss this. I recall some discussions about allowing these
>> escapes in identifiers, but I don't think those discussions were in minuted
>> meetings and it hasn't been polled in SG16 or EWG(I). Personally, I don't
>> see sufficient motivation for allowing one to type:
>>
>> int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
>>
>> vs:
>>
>> int \u0100 = 1;
>>
>> precisely because there is little motivation to be able to type the
>> latter one. In my mind, use of *universal-character-name* escapes
>> outside of literals exists as a mechanism to support source character
>> encodings that support characters outside the basic source character set.
>> Virtually all programmers are going to type the following instead:
>>
>> int Ā = 1;
>>
>> Motivation to be able to type the form containing the
>> *universal-character-name* exists so that identifiers that can't
>> otherwise be represented in the source encoding of a particular source file
>> can still be represented. I'm not sure that motivation extends to being
>> able to type the *named-escape-sequence* variant.
>>
>> Tom.
>>
>> On 2/26/20 4:25 PM, Richard Smith wrote:
>>
>> Well, "universal character name" / UCN is established terminology in C
>> and C++ that has been around for more than 20 years, and does not appear to
>> be used for any other purpose. If we rename it, a lot of reference material
>> (for example) will need to be updated. Given that, it's unclear to me that
>> renaming it will be a net improvement, although removing any possible
>> confusion with the "na" property of the character would certainly be a good
>> thing. Also, a UCN is not a codepoint per se -- rather, it is a specific
>> syntax for referring to (naming) a codepoint -- and "character codepoint"
>> seems a bit redundant. If we're going to rename it, something like
>> *unicode-escape-sequence* would seem more fitting.
>>
>> Independent of anything else, I think that P1097 should allow \N{...} in
>> identifiers, for consistency with \u / \U -- I find it very hard to see a
>> reason why the two should be permitted in a different set of contexts. And
>> if we do that, then we can just add productions to the existing
>> *universal-character-name* nonterminal, and not need to rename anything.
>> So I think renaming the grammar production is at least premature. If we do
>> it at all, it should be done by P1097.
>>
>> On Wed, Feb 26, 2020 at 12:35 PM Tom Honermann via Core <
>> core_at_[hidden]> wrote:
>>
>>> On 2/26/20 5:15 AM, Corentin Jabot via SG16 wrote:
>>>
>>> +sg16
>>>
>>> On Wed, 26 Feb 2020 at 11:12, Corentin <corentin.jabot_at_[hidden]> wrote:
>>>
>>>> Hello,
>>>> To use terminology more aligned with Unicode and to avoid confusion
>>>> with character names - which are for example used by P1097R2 - Named
>>>> character escapes, I would like to rename mechanically
>>>> *universal-character-name* to *universal-character-codepoint*
>>>>
>>> I was tempted to do this as part of P2029 (Proposed resolution for core
>>> issues 411, 1656, and 2333; numeric and universal character escapes in
>>> character and string literals), but decided it was too much to bite off as
>>> part of that effort. I do think a rename is in order.
>>>
>>> Tom.
>>>
>>> Is that something coerce would be willing to do? If so, what would be
>>>> the best way to do it? Paper targeting core?
>>>>
>>>> Regards,
>>>>
>>>> Corentin
>>>>
>>>
>>>
>>> _______________________________________________
>>> Core mailing list
>>> Core_at_[hidden]
>>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
>>> Link to this post: http://lists.isocpp.org/core/2020/02/8561.php
>>>
>>
>>
>>
>

Received on 2020-02-29 15:08:41