C++ Logo

sg16

Advanced search

Re: [SG16] [isocpp-core] Renaming universal-character-name

From: Tony V E <tvaneerd_at_[hidden]>
Date: Sat, 29 Feb 2020 13:34:33 -0500
> Independent of anything else, I think that P1097 should allow \N{...} in identifiers, for consistency with \u / \U -- I find it very hard to see a reason why the two should be permitted in a different set of contexts. 

+1

There may only be weak reasoning to allow \N identifiers but I think it buys more than disallowing it. 


int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
vs:
int \u0100 = 1;
vs:

  int Ā = 1;


At least the \N one gives me a hint what is going on. 

If Ā was part of an external function of a library and I needed to call it, but can't type Ā, the \N form‎ gives a hint. (I would probably hide it behind an inline function either way:

inline Bar AWithMacron‎(Foo foo)
{
    ‎return \N{LATIN CAPITAL LETTER A WITH MACRON}(foo);
}


Sent from my BlackBerry portable Babbage Device
From: Tom Honermann via Core
Sent: Thursday, February 27, 2020 12:01 PM
To: Richard Smith; C++ Core Language Working Group
Reply To: core_at_[hidden]
Cc: Tom Honermann; sg16_at_[hidden]; Corentin Jabot
Subject: Re: [isocpp-core] [SG16] Renaming universal-character-name

SG16 happened to be meeting and discussing this topic concurrently with Richard's email.  I'll have minutes posted to https://github.com/sg16-unicode/sg16-meetings#february-26th-2020 in the next couple of days.

In that meeting, we had general consensus (we didn't poll) for renaming universal-character-name to unicode-code-point while keeping named-escape-sequence as is with direction that I update P2071 (the successor to P1097) to provide editorial direction for the rename.

I don't have strong opinions on the rename.  I was under the impression that universal-character-name was introduced in C11/C++11, but I see I was mistaken as it is present in C++98.  In retrospect, I don't know why I had that impression.

The fact that this term has been around since C99 and C++98 does give me pause.  I'll refrain from proposing the rename in P2071 pending further discussion.

With regard to allowing \N{...} in identifiers, P2071R0 does mention the possibility of such allowance as a future extension, but without discussion.  I'll update the paper to discuss this.  I recall some discussions about allowing these escapes in identifiers, but I don't think those discussions were in minuted meetings and it hasn't been polled in SG16 or EWG(I).  Personally, I don't see sufficient motivation for allowing one to type:
int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
vs:
int \u0100 = 1;
precisely because there is little motivation to be able to type the latter one.  In my mind, use of universal-character-name escapes outside of literals exists as a mechanism to support source character encodings that support characters outside the basic source character set.  Virtually all programmers are going to type the following instead:
int Ā = 1;
Motivation to be able to type the form containing the universal-character-name exists so that identifiers that can't otherwise be represented in the source encoding of a particular source file can still be represented.  I'm not sure that motivation extends to being able to type the named-escape-sequence variant.

Tom.

On 2/26/20 4:25 PM, Richard Smith wrote:
Well, "universal character name" / UCN is established terminology in C and C++ that has been around for more than 20 years, and does not appear to be used for any other purpose. If we rename it, a lot of reference material (for example) will need to be updated. Given that, it's unclear to me that renaming it will be a net improvement, although removing any possible confusion with the "na" property of the character would certainly be a good thing. Also, a UCN is not a codepoint per se -- rather, it is a specific syntax for referring to (naming) a codepoint -- and "character codepoint" seems a bit redundant. If we're going to rename it, something like unicode-escape-sequence would seem more fitting.

Independent of anything else, I think that P1097 should allow \N{...} in identifiers, for consistency with \u / \U -- I find it very hard to see a reason why the two should be permitted in a different set of contexts. And if we do that, then we can just add productions to the existing universal-character-name nonterminal, and not need to rename anything. So I think renaming the grammar production is at least premature. If we do it at all, it should be done by P1097.

On Wed, Feb 26, 2020 at 12:35 PM Tom Honermann via Core <core_at_[hidden]> wrote:
On 2/26/20 5:15 AM, Corentin Jabot via SG16 wrote:
+sg16

On Wed, 26 Feb 2020 at 11:12, Corentin <corentin.jabot_at_[hidden]> wrote:
Hello,
To use terminology more aligned with Unicode and to avoid confusion with character names - which are for example used by P1097R2 - Named character escapes, I would like to rename mechanically universal-character-name to universal-character-codepoint

I was tempted to do this as part of P2029 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals), but decided it was too much to bite off as part of that effort.  I do think a rename is in order.

Tom.

Is that something coerce would be willing to do? If so, what would be the best way to do it? Paper targeting core?

Regards,

Corentin


_______________________________________________
Core mailing list
Core_at_[hidden]
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
Link to this post: http://lists.isocpp.org/core/2020/02/8561.php



Received on 2020-02-29 12:37:18