On Sat, 29 Feb 2020 at 23:44, Gabriel Dos Reis <gdr@microsoft.com> wrote:

Why?

It is not used - for good reason as the only possible use case is legacy code, but legacy code is even less likely to use non basic-source-character-set codepoints -, it is weirdly opinionated ( foo\u0065 is ill-formed, foo\u0165 is well formed), and we are talking about extend it it?

-- Gaby

From: Core <core-bounces@lists.isocpp.org> on behalf of Corentin Jabot via Core <core@lists.isocpp.org>
Sent: Saturday, February 29, 2020 1:05:45 PM
To: Tony V E <tvaneerd@gmail.com>
Cc: Corentin Jabot <corentinjabot@gmail.com>; sg16@lists.isocpp.org <sg16@lists.isocpp.org>; Tom Honermann <tom@honermann.net>; C++ Core Language Working Group <core@lists.isocpp.org>
Subject: Re: [isocpp-core] [SG16] Renaming universal-character-name

On Sat, Feb 29, 2020, 21:13 Tony V E <tvaneerd@gmail.com> wrote:

I don't anyone using anything but ASCII.

So there's no problem, right?

My point is maybe we should deprecate escaped identifiers rather than add more.

Sent from my BlackBerry portable Babbage Device

From: Corentin Jabot

Sent: Saturday, February 29, 2020 2:06 PM

To: Tony V E

Cc: Richard Smith; C++ Core Language Working Group; Tom Honermann; sg16@lists.isocpp.org

Subject: Re: [isocpp-core] [SG16] Renaming universal-character-name

On Sat, Feb 29, 2020, 19:34 Tony V E <tvaneerd@gmail.com> wrote:

> Independent of anything else, I think that P1097 should allow \N{...} in identifiers, for consistency with \u / \U -- I find it very hard to see a reason why the two should be permitted in a different set of contexts.

+1

There may only be weak reasoning to allow \N identifiers but I think it buys more than disallowing it.

int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;

vs:

int \u0100 = 1;

vs:

int Ā = 1;

At least the \N one gives me a hint what is going on.

If Ā was part of an external function of a library and I needed to call it, but can't type Ā, the \N form‎ gives a hint. (I would probably hide it behind an inline function either way:

inline Bar AWithMacron‎(Foo foo)

{

‎return \N{LATIN CAPITAL LETTER A WITH MACRON}(foo);

}

Do we know of people using \u in identifiers?

Sent from my BlackBerry portable Babbage Device

From: Tom Honermann via Core

Sent: Thursday, February 27, 2020 12:01 PM

To: Richard Smith; C++ Core Language Working Group

Reply To: core@lists.isocpp.org

Cc: Tom Honermann; sg16@lists.isocpp.org; Corentin Jabot

Subject: Re: [isocpp-core] [SG16] Renaming universal-character-name

SG16 happened to be meeting and discussing this topic concurrently with Richard's email. I'll have minutes posted to https://github.com/sg16-unicode/sg16-meetings#february-26th-2020 in the next couple of days.

In that meeting, we had general consensus (we didn't poll) for renaming universal-character-name to unicode-code-point while keeping named-escape-sequence as is with direction that I update P2071 (the successor to P1097) to provide editorial direction for the rename.

I don't have strong opinions on the rename. I was under the impression that universal-character-name was introduced in C11/C++11, but I see I was mistaken as it is present in C++98. In retrospect, I don't know why I had that impression.

The fact that this term has been around since C99 and C++98 does give me pause. I'll refrain from proposing the rename in P2071 pending further discussion.

With regard to allowing \N{...} in identifiers, P2071R0 does mention the possibility of such allowance as a future extension, but without discussion. I'll update the paper to discuss this. I recall some discussions about allowing these escapes in identifiers, but I don't think those discussions were in minuted meetings and it hasn't been polled in SG16 or EWG(I). Personally, I don't see sufficient motivation for allowing one to type:

int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;

vs:

int \u0100 = 1;

precisely because there is little motivation to be able to type the latter one. In my mind, use of universal-character-name escapes outside of literals exists as a mechanism to support source character encodings that support characters outside the basic source character set. Virtually all programmers are going to type the following instead:

int Ā = 1;

Motivation to be able to type the form containing the universal-character-name exists so that identifiers that can't otherwise be represented in the source encoding of a particular source file can still be represented. I'm not sure that motivation extends to being able to type the named-escape-sequence variant.

Tom.

On 2/26/20 4:25 PM, Richard Smith wrote:

Well, "universal character name" / UCN is established terminology in C and C++ that has been around for more than 20 years, and does not appear to be used for any other purpose. If we rename it, a lot of reference material (for example) will need to be updated. Given that, it's unclear to me that renaming it will be a net improvement, although removing any possible confusion with the "na" property of the character would certainly be a good thing. Also, a UCN is not a codepoint per se -- rather, it is a specific syntax for referring to (naming) a codepoint -- and "character codepoint" seems a bit redundant. If we're going to rename it, something like unicode-escape-sequence would seem more fitting.

Independent of anything else, I think that P1097 should allow \N{...} in identifiers, for consistency with \u / \U -- I find it very hard to see a reason why the two should be permitted in a different set of contexts. And if we do that, then we can just add productions to the existing universal-character-name nonterminal, and not need to rename anything. So I think renaming the grammar production is at least premature. If we do it at all, it should be done by P1097.

On Wed, Feb 26, 2020 at 12:35 PM Tom Honermann via Core <core@lists.isocpp.org> wrote:

On 2/26/20 5:15 AM, Corentin Jabot via SG16 wrote:

+sg16

On Wed, 26 Feb 2020 at 11:12, Corentin <corentin.jabot@gmail.com> wrote:

Hello,
To use terminology more aligned with Unicode and to avoid confusion with character names - which are for example used by P1097R2 - Named character escapes, I would like to rename mechanically universal-character-name to universal-character-codepoint

I was tempted to do this as part of P2029 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals), but decided it was too much to bite off as part of that effort. I do think a rename is in order.

Tom.

Is that something coerce would be willing to do? If so, what would be the best way to do it? Paper targeting core?

Regards,

Corentin

_______________________________________________
Core mailing list
Core@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
Link to this post: http://lists.isocpp.org/core/2020/02/8561.php