Date: Sat, 29 Feb 2020 13:34:33 -0500
> Independent of anything else, I think that P1097 should allow \N{...} in identifiers, for consistency with \u / \U -- I find it very hard to see a reason why the two should be permitted in a different set of contexts.
+1
There may only be weak reasoning to allow \N identifiers but I think it buys more than disallowing it.
int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
vs:
int \u0100 = 1;
vs:
int Ā = 1;
At least the \N one gives me a hint what is going on.
If Ā was part of an external function of a library and I needed to call it, but can't type Ā, the \N form gives a hint. (I would probably hide it behind an inline function either way:
inline Bar AWithMacron(Foo foo)
{
return \N{LATIN CAPITAL LETTER A WITH MACRON}(foo);
}
Sent from my BlackBerry portable Babbage Device
From: Tom Honermann via Core Sent: Thursday, February 27, 2020 12:01 PM To: Richard Smith; C++ Core Language Working Group Reply To: core_at_[hidden] Cc: Tom Honermann; sg16_at_[hidden]; Corentin Jabot Subject: Re: [isocpp-core] [SG16] Renaming universal-character-name |
SG16 happened to be meeting and
discussing this topic concurrently with Richard's email. I'll
have minutes posted to https://github.com/sg16-unicode/sg16-meetings#february-26th-2020
in the next couple of days.
In that meeting, we had general
consensus (we didn't poll) for renaming universal-character-name
to unicode-code-point while keeping named-escape-sequence
as is with direction that I update P2071 (the successor to P1097)
to provide editorial direction for the rename.
I don't have strong opinions on the
rename. I was under the impression that universal-character-name
was introduced in C11/C++11, but I see I was mistaken as it is
present in C++98. In retrospect, I don't know why I had that
impression.
The fact that this term has been around
since C99 and C++98 does give me pause. I'll refrain from
proposing the rename in P2071 pending further discussion.
With regard to allowing \N{...} in
identifiers, P2071R0 does mention the
possibility of such allowance as a future extension, but
without discussion. I'll update the paper to discuss this. I
recall some discussions about allowing these escapes in
identifiers, but I don't think those discussions were in minuted
meetings and it hasn't been polled in SG16 or EWG(I). Personally,
I don't see sufficient motivation for allowing one to type:
int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
vs:
int \u0100 = 1;
precisely because there is little
motivation to be able to type the latter one. In my mind, use of
universal-character-name escapes outside of literals exists
as a mechanism to support source character encodings that support
characters outside the basic source character set. Virtually all
programmers are going to type the following instead:
int Ā = 1;
Motivation to be able to type the form
containing the universal-character-name exists so that
identifiers that can't otherwise be represented in the source
encoding of a particular source file can still be represented.
I'm not sure that motivation extends to being able to type the named-escape-sequence
variant.
Tom.
On 2/26/20 4:25 PM, Richard Smith
wrote:
Well, "universal character name" / UCN is established terminology in C and C++ that has been around for more than 20 years, and does not appear to be used for any other purpose. If we rename it, a lot of reference material (for example) will need to be updated. Given that, it's unclear to me that renaming it will be a net improvement, although removing any possible confusion with the "na" property of the character would certainly be a good thing. Also, a UCN is not a codepoint per se -- rather, it is a specific syntax for referring to (naming) a codepoint -- and "character codepoint" seems a bit redundant. If we're going to rename it, something like unicode-escape-sequence would seem more fitting.
Independent of anything else, I think that P1097 should allow \N{...} in identifiers, for consistency with \u / \U -- I find it very hard to see a reason why the two should be permitted in a different set of contexts. And if we do that, then we can just add productions to the existing universal-character-name nonterminal, and not need to rename anything. So I think renaming the grammar production is at least premature. If we do it at all, it should be done by P1097.
On Wed, Feb 26, 2020 at 12:35 PM Tom Honermann via Core <core_at_[hidden]> wrote:
_______________________________________________On 2/26/20 5:15 AM, Corentin Jabot via SG16 wrote:
+sg16
On Wed, 26 Feb 2020 at 11:12, Corentin <corentin.jabot_at_[hidden]> wrote:
Hello,To use terminology more aligned with Unicode and to avoid confusion with character names - which are for example used by P1097R2 - Named character escapes, I would like to rename mechanically universal-character-name to universal-character-codepointI was tempted to do this as part of P2029 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals), but decided it was too much to bite off as part of that effort. I do think a rename is in order.
Tom.
Is that something coerce would be willing to do? If so, what would be the best way to do it? Paper targeting core?
Regards,
Corentin
Core mailing list
Core_at_[hidden]
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
Link to this post: http://lists.isocpp.org/core/2020/02/8561.php
Received on 2020-02-29 12:37:18