On 07/06/2018 05:37 PM, Hubert Tong wrote:
On Fri, Jul 6, 2018 at 5:31 PM, Tom Honermann <tom@honermann.net> wrote:
On 07/06/2018 05:16 PM, Hubert Tong wrote:
I am wondering if accepting U+(4-6 hex digits) in \N{...} as Perl does can be considered.

It certainly can be, but what is the motivation given that we already have \u and \U?  Why is supporting both \u1234 and \N{U+1234} helpful?
Do stylistic choices count? I happen to like naming Unicode characters as U+NNNN.

Certainly!  Getting everyone to agree on a stylistic choice is always fun though ;)


There is also a possible semantic difference to explore between \u/\U and \N{U+...}:
The \N form should certainly require that a character is assigned in Unicode; however, I think assigning a more "raw" meaning to \u/\U could make sense.

I think you might be on to something here.  Martinho was recently lamenting the following wording from [lex.ccon]p9 (http://eel.is/c++draft/lex.ccon#9):

> A universal-character-name is translated to the encoding, in the appropriate execution character set, of the character named.  If there is no such encoding, the universal-character-name is translated to an implementation-defined encoding. ...

Specifically, he observed that translation to some implementation defined representation (presumably some replacement character) is actively harmful.  Making such mappings ill-formed would catch problems that can, and should, be diagnosed at compile-time.  We could, of course, consider a change to the wording above, but that would have backward compatibility impact.  Your suggestion of different semantics could allow us to retain the current implementation-defined behavior for \u1234, but make \N{U+1234} ill-formed if the target encoding can't represent U+1234.  Good justification for your stylistic preference? ;)

Tom.