SG16 happened to be meeting and discussing this topic concurrently with Richard's email.  I'll have minutes posted to in the next couple of days.

In that meeting, we had general consensus (we didn't poll) for renaming universal-character-name to unicode-code-point while keeping named-escape-sequence as is with direction that I update P2071 (the successor to P1097) to provide editorial direction for the rename.

I don't have strong opinions on the rename.  I was under the impression that universal-character-name was introduced in C11/C++11, but I see I was mistaken as it is present in C++98.  In retrospect, I don't know why I had that impression.

The fact that this term has been around since C99 and C++98 does give me pause.  I'll refrain from proposing the rename in P2071 pending further discussion.

With regard to allowing \N{...} in identifiers, P2071R0 does mention the possibility of such allowance as a future extension, but without discussion.  I'll update the paper to discuss this.  I recall some discussions about allowing these escapes in identifiers, but I don't think those discussions were in minuted meetings and it hasn't been polled in SG16 or EWG(I).  Personally, I don't see sufficient motivation for allowing one to type:
int \u0100 = 1;
precisely because there is little motivation to be able to type the latter one.  In my mind, use of universal-character-name escapes outside of literals exists as a mechanism to support source character encodings that support characters outside the basic source character set.  Virtually all programmers are going to type the following instead:
int Ā = 1;
Motivation to be able to type the form containing the universal-character-name exists so that identifiers that can't otherwise be represented in the source encoding of a particular source file can still be represented.  I'm not sure that motivation extends to being able to type the named-escape-sequence variant.


On 2/26/20 4:25 PM, Richard Smith wrote:
Well, "universal character name" / UCN is established terminology in C and C++ that has been around for more than 20 years, and does not appear to be used for any other purpose. If we rename it, a lot of reference material (for example) will need to be updated. Given that, it's unclear to me that renaming it will be a net improvement, although removing any possible confusion with the "na" property of the character would certainly be a good thing. Also, a UCN is not a codepoint per se -- rather, it is a specific syntax for referring to (naming) a codepoint -- and "character codepoint" seems a bit redundant. If we're going to rename it, something like unicode-escape-sequence would seem more fitting.

Independent of anything else, I think that P1097 should allow \N{...} in identifiers, for consistency with \u / \U -- I find it very hard to see a reason why the two should be permitted in a different set of contexts. And if we do that, then we can just add productions to the existing universal-character-name nonterminal, and not need to rename anything. So I think renaming the grammar production is at least premature. If we do it at all, it should be done by P1097.

On Wed, Feb 26, 2020 at 12:35 PM Tom Honermann via Core <> wrote:
On 2/26/20 5:15 AM, Corentin Jabot via SG16 wrote:

On Wed, 26 Feb 2020 at 11:12, Corentin <> wrote:
To use terminology more aligned with Unicode and to avoid confusion with character names - which are for example used by P1097R2 - Named character escapes, I would like to rename mechanically universal-character-name to universal-character-codepoint

I was tempted to do this as part of P2029 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals), but decided it was too much to bite off as part of that effort.  I do think a rename is in order.


Is that something coerce would be willing to do? If so, what would be the best way to do it? Paper targeting core?



Core mailing list
Link to this post: