On 07/06/2018 05:37 PM, Hubert Tong wrote:

On Fri, Jul 6, 2018 at 5:31 PM, Tom Honermann <tom@honermann.net> wrote:

On 07/06/2018 05:16 PM, Hubert Tong wrote:

I am wondering if accepting U+(4-6 hex digits) in \N{...} as Perl does can be considered.

It certainly can be, but what is the motivation given that we already have \u and \U? Why is supporting both \u1234 and \N{U+1234} helpful?

Do stylistic choices count? I happen to like naming Unicode characters as U+NNNN.

Certainly! Getting everyone to agree on a stylistic choice is always fun though ;)

There is also a possible semantic difference to explore between \u/\U and \N{U+...}:

The \N form should certainly require that a character is assigned in Unicode; however, I think assigning a more "raw" meaning to \u/\U could make sense.

I think you might be on to something here. Martinho was recently lamenting the following wording from [lex.ccon]p9 (http://eel.is/c++draft/lex.ccon#9):

> A universal-character-name is translated to the encoding, in the appropriate execution character set, of the character named. If there is no such encoding, the universal-character-name is translated to an implementation-defined encoding. ...

Specifically, he observed that translation to some implementation defined representation (presumably some replacement character) is actively harmful. Making such mappings ill-formed would catch problems that can, and should, be diagnosed at compile-time. We could, of course, consider a change to the wording above, but that would have backward compatibility impact. Your suggestion of different semantics could allow us to retain the current implementation-defined behavior for \u1234, but make \N{U+1234} ill-formed if the target encoding can't represent U+1234. Good justification for your stylistic preference? ;)

Tom.