Re: [SG16] [isocpp-core] Renaming universal-character-name

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 27 Feb 2020 12:01:32 -0500
SG16 happened to be meeting and discussing this topic concurrently with
Richard's email. I'll have minutes posted to
https://github.com/sg16-unicode/sg16-meetings#february-26th-2020 in the
next couple of days.

In that meeting, we had general consensus (we didn't poll) for renaming
/universal-character-name/ to /unicode-code-point/ while keeping
/named-escape-sequence/ as is with direction that I update P2071
<https://wg21.link/p2071> (the successor to P1097
<https://wg21.link/p1097>) to provide editorial direction for the rename.

I don't have strong opinions on the rename. I was under the impression
that /universal-character-name/ was introduced in C11/C++11, but I see I
was mistaken as it is present in C++98. In retrospect, I don't know why
I had that impression.

The fact that this term has been around since C99 and C++98 does give me
pause. I'll refrain from proposing the rename in P2071 pending further

With regard to allowing \N{...} in identifiers, P2071R0 does mention the
possibility of such allowance as a future extension
<https://wg21.link/p2071#future>, but without discussion. I'll update
the paper to discuss this. I recall some discussions about allowing
these escapes in identifiers, but I don't think those discussions were
in minuted meetings and it hasn't been polled in SG16 or EWG(I).
Personally, I don't see sufficient motivation for allowing one to type:



    int \u0100 = 1;

precisely because there is little motivation to be able to type the
latter one. In my mind, use of /universal-character-name/ escapes
outside of literals exists as a mechanism to support source character
encodings that support characters outside the basic source character
set. Virtually all programmers are going to type the following instead:

    int Ā = 1;

Motivation to be able to type the form containing the
/universal-character-name/ exists so that identifiers that can't
otherwise be represented in the source encoding of a particular source
file can still be represented. I'm not sure that motivation extends to
being able to type the /named-escape-sequence/ variant.


On 2/26/20 4:25 PM, Richard Smith wrote:
> Well, "universal character name" / UCN is established terminology in C
> and C++ that has been around for more than 20 years, and does not
> appear to be used for any other purpose. If we rename it, a lot of
> reference material (for example) will need to be updated. Given that,
> it's unclear to me that renaming it will be a net improvement,
> although removing any possible confusion with the "na" property of the
> character would certainly be a good thing. Also, a UCN is not a
> codepoint per se -- rather, it is a specific syntax for referring to
> (naming) a codepoint -- and "character codepoint" seems a bit
> redundant. If we're going to rename it, something like
> /unicode-escape-sequence/ would seem more fitting.
> Independent of anything else, I think that P1097 should allow \N{...}
> in identifiers, for consistency with \u / \U -- I find it very hard to
> see a reason why the two should be permitted in a different set of
> contexts. And if we do that, then we can just add productions to the
> existing /universal-character-name/ nonterminal, and not need to
> rename anything. So I think renaming the grammar production is at
> least premature. If we do it at all, it should be done by P1097.
> On Wed, Feb 26, 2020 at 12:35 PM Tom Honermann via Core
> <core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
> On 2/26/20 5:15 AM, Corentin Jabot via SG16 wrote:
>> +sg16
>> On Wed, 26 Feb 2020 at 11:12, Corentin <corentin.jabot_at_[hidden]
>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>> Hello,
>> To use terminology more aligned with Unicode and to avoid
>> confusion with character names - which are for example used
>> by P1097R2 - Named character escapes, I would like to rename
>> mechanically *universal-character-name* to
>> *universal-character-codepoint*
> I was tempted to do this as part of P2029 (Proposed resolution for
> core issues 411, 1656, and 2333; numeric and universal character
> escapes in character and string literals), but decided it was too
> much to bite off as part of that effort. I do think a rename is
> in order.
> Tom.
>> Is that something coerce would be willing to do? If so, what
>> would be the best way to do it? Paper targeting core?
>> Regards,
>> Corentin
