C++ Logo

SG16

Advanced search

Subject: Re: [isocpp-core] Renaming universal-character-name
From: Corentin Jabot (corentinjabot_at_[hidden])
Date: 2020-03-01 02:46:22


On Sat, 29 Feb 2020 at 23:44, Gabriel Dos Reis <gdr_at_[hidden]> wrote:

> Why?
>

It is not used - for good reason as the only possible use case is legacy
code, but legacy code is even less likely to use non
basic-source-character-set codepoints -, it is weirdly opinionated (
foo\u0065 is ill-formed, foo\u0165 is well formed), and we are talking
about extend it it?

>
> -- Gaby
> ------------------------------
> *From:* Core <core-bounces_at_[hidden]> on behalf of Corentin Jabot
> via Core <core_at_[hidden]>
> *Sent:* Saturday, February 29, 2020 1:05:45 PM
> *To:* Tony V E <tvaneerd_at_[hidden]>
> *Cc:* Corentin Jabot <corentinjabot_at_[hidden]>; sg16_at_[hidden] <
> sg16_at_[hidden]>; Tom Honermann <tom_at_[hidden]>; C++ Core
> Language Working Group <core_at_[hidden]>
> *Subject:* Re: [isocpp-core] [SG16] Renaming universal-character-name
>
>
>
> On Sat, Feb 29, 2020, 21:13 Tony V E <tvaneerd_at_[hidden]> wrote:
>
> I don't anyone using anything but ASCII.
>
> So there's no problem, right?
>
>
> My point is maybe we should deprecate escaped identifiers rather than add
> more.
>
>
> Sent from my BlackBerry portable Babbage Device
> *From: *Corentin Jabot
> *Sent: *Saturday, February 29, 2020 2:06 PM
> *To: *Tony V E
> *Cc: *Richard Smith; C++ Core Language Working Group; Tom Honermann;
> sg16_at_[hidden]
> *Subject: *Re: [isocpp-core] [SG16] Renaming universal-character-name
>
>
>
> On Sat, Feb 29, 2020, 19:34 Tony V E <tvaneerd_at_[hidden]> wrote:
>
> > Independent of anything else, I think that P1097 should allow \N{...}
> in identifiers, for consistency with \u / \U -- I find it very hard to see
> a reason why the two should be permitted in a different set of contexts.
>
> +1
>
> There may only be weak reasoning to allow \N identifiers but I think it
> buys more than disallowing it.
>
>
> int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
>
> vs:
>
> int \u0100 = 1;
>
> vs:
>
> int Ā = 1;
>
>
> At least the \N one gives me a hint what is going on.
>
> If Ā was part of an external function of a library and I needed to call
> it, but can't type Ā, the \N form‎ gives a hint. (I would probably hide
> it behind an inline function either way:
>
> inline Bar AWithMacron‎(Foo foo)
> {
> ‎return \N{LATIN CAPITAL LETTER A WITH MACRON}(foo);
> }
>
>
> Do we know of people using \u in identifiers?
>
>
>
> Sent from my BlackBerry portable Babbage Device
> *From: *Tom Honermann via Core
> *Sent: *Thursday, February 27, 2020 12:01 PM
> *To: *Richard Smith; C++ Core Language Working Group
> *Reply To: *core_at_[hidden]
> *Cc: *Tom Honermann; sg16_at_[hidden]; Corentin Jabot
> *Subject: *Re: [isocpp-core] [SG16] Renaming universal-character-name
>
> SG16 happened to be meeting and discussing this topic concurrently with
> Richard's email. I'll have minutes posted to
> https://github.com/sg16-unicode/sg16-meetings#february-26th-2020
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsg16-unicode%2Fsg16-meetings%23february-26th-2020&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650645947&sdata=5gCdMhv2TsGYHH%2BDAjrme4r1Zc3JS0wnr5SXFnn2zQE%3D&reserved=0>
> in the next couple of days.
>
> In that meeting, we had general consensus (we didn't poll) for renaming
> *universal-character-name* to *unicode-code-point* while keeping
> *named-escape-sequence* as is with direction that I update P2071
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwg21.link%2Fp2071&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650655943&sdata=Rh6M8PjxGr%2BNYc4sY4prbhLNPhCH9RvQsgvzSAJ1dJw%3D&reserved=0>
> (the successor to P1097
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwg21.link%2Fp1097&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650655943&sdata=E3XJSo1h%2BLtoAsKLd6ts%2Bj2CtE23oCRVDq1fzqDCnMQ%3D&reserved=0>)
> to provide editorial direction for the rename.
>
> I don't have strong opinions on the rename. I was under the impression
> that *universal-character-name* was introduced in C11/C++11, but I see I
> was mistaken as it is present in C++98. In retrospect, I don't know why I
> had that impression.
>
> The fact that this term has been around since C99 and C++98 does give me
> pause. I'll refrain from proposing the rename in P2071 pending further
> discussion.
>
> With regard to allowing \N{...} in identifiers, P2071R0 does mention the
> possibility of such allowance as a future extension
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwg21.link%2Fp2071%23future&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650655943&sdata=6TIBZAtlLziY0xj2xS0ALOHZnuT%2Bp%2BFUhWQXSWqli%2FU%3D&reserved=0>,
> but without discussion. I'll update the paper to discuss this. I recall
> some discussions about allowing these escapes in identifiers, but I don't
> think those discussions were in minuted meetings and it hasn't been polled
> in SG16 or EWG(I). Personally, I don't see sufficient motivation for
> allowing one to type:
>
> int \N{LATIN CAPITAL LETTER A WITH MACRON} = 1;
>
> vs:
>
> int \u0100 = 1;
>
> precisely because there is little motivation to be able to type the latter
> one. In my mind, use of *universal-character-name* escapes outside of
> literals exists as a mechanism to support source character encodings that
> support characters outside the basic source character set. Virtually all
> programmers are going to type the following instead:
>
> int Ā = 1;
>
> Motivation to be able to type the form containing the
> *universal-character-name* exists so that identifiers that can't
> otherwise be represented in the source encoding of a particular source file
> can still be represented. I'm not sure that motivation extends to being
> able to type the *named-escape-sequence* variant.
>
> Tom.
>
> On 2/26/20 4:25 PM, Richard Smith wrote:
>
> Well, "universal character name" / UCN is established terminology in C and
> C++ that has been around for more than 20 years, and does not appear to be
> used for any other purpose. If we rename it, a lot of reference material
> (for example) will need to be updated. Given that, it's unclear to me that
> renaming it will be a net improvement, although removing any possible
> confusion with the "na" property of the character would certainly be a good
> thing. Also, a UCN is not a codepoint per se -- rather, it is a specific
> syntax for referring to (naming) a codepoint -- and "character codepoint"
> seems a bit redundant. If we're going to rename it, something like
> *unicode-escape-sequence* would seem more fitting.
>
> Independent of anything else, I think that P1097 should allow \N{...} in
> identifiers, for consistency with \u / \U -- I find it very hard to see a
> reason why the two should be permitted in a different set of contexts. And
> if we do that, then we can just add productions to the existing
> *universal-character-name* nonterminal, and not need to rename anything.
> So I think renaming the grammar production is at least premature. If we do
> it at all, it should be done by P1097.
>
> On Wed, Feb 26, 2020 at 12:35 PM Tom Honermann via Core <
> core_at_[hidden]> wrote:
>
> On 2/26/20 5:15 AM, Corentin Jabot via SG16 wrote:
>
> +sg16
>
> On Wed, 26 Feb 2020 at 11:12, Corentin <corentin.jabot_at_[hidden]> wrote:
>
> Hello,
> To use terminology more aligned with Unicode and to avoid confusion with
> character names - which are for example used by P1097R2 - Named character
> escapes, I would like to rename mechanically *universal-character-name*
> to *universal-character-codepoint*
>
> I was tempted to do this as part of P2029 (Proposed resolution for core
> issues 411, 1656, and 2333; numeric and universal character escapes in
> character and string literals), but decided it was too much to bite off as
> part of that effort. I do think a rename is in order.
>
> Tom.
>
> Is that something coerce would be willing to do? If so, what would be the
> best way to do it? Paper targeting core?
>
> Regards,
>
> Corentin
>
>
>
> _______________________________________________
> Core mailing list
> Core_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Fcore&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650665939&sdata=MgBuvV4jhOImsvAOcapfuE%2Ffevmb0mRY7cYYXTkqan8%3D&reserved=0>
> Link to this post: http://lists.isocpp.org/core/2020/02/8561.php
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Fcore%2F2020%2F02%2F8561.php&data=02%7C01%7Cgdr%40microsoft.com%7Cfdef1a2135494a0ea2fc08d7bd5b300a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637186071650665939&sdata=xmpOOv%2BSgUfqxOPDMp0c35uvsO3ozgktbhUT9uJK8qE%3D&reserved=0>
>
>
>
>
>



SG16 list run by herb.sutter at gmail.com