C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Feedback on P1097R1: U+NNNNNN syntax

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 6 Jul 2018 20:33:54 -0400
On 07/06/2018 05:37 PM, Hubert Tong wrote:
> On Fri, Jul 6, 2018 at 5:31 PM, Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
>
> On 07/06/2018 05:16 PM, Hubert Tong wrote:
>
> I am wondering if accepting U+(4-6 hex digits) in \N{...} as
> Perl does can be considered.
>
>
> It certainly can be, but what is the motivation given that we
> already have \u and \U? Why is supporting both \u1234 and
> \N{U+1234} helpful?
>
> Do stylistic choices count? I happen to like naming Unicode characters
> as U+NNNN.

Certainly! Getting everyone to agree on a stylistic choice is always
fun though ;)

>
> There is also a possible semantic difference to explore between \u/\U
> and \N{U+...}:
> The \N form should certainly require that a character is assigned in
> Unicode; however, I think assigning a more "raw" meaning to \u/\U
> could make sense.

I think you might be on to something here. Martinho was recently
lamenting the following wording from [lex.ccon]p9
(http://eel.is/c++draft/lex.ccon#9):

> A /universal-character-name/ is translated to the encoding, in the
appropriate execution character set, of the character named. *If there
is no such encoding, the **/universal-character-name/**is translated to
an ****implementation-defined encoding**.* ...

Specifically, he observed that translation to some implementation
defined representation (presumably some replacement character) is
actively harmful. Making such mappings ill-formed would catch problems
that can, and should, be diagnosed at compile-time. We could, of
course, consider a change to the wording above, but that would have
backward compatibility impact. Your suggestion of different semantics
could allow us to retain the current implementation-defined behavior for
\u1234, but make \N{U+1234} ill-formed if the target encoding can't
represent U+1234. Good justification for your stylistic preference? ;)

Tom.

Received on 2018-07-07 02:33:58