C++ Logo

SG16

Advanced search

Subject: Re: P2194R0 The character set of C++ source code is Unicode
From: Alisdair Meredith (alisdairm_at_[hidden])
Date: 2020-08-24 15:38:24


> On Aug 24, 2020, at 16:23, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
> On 24/08/2020 21.44, Alisdair Meredith via SG16 wrote:
>> Got another good corner case for you!
>>
>> In the template form of user defined literals, the template parameter pack
>> is instiated with characters corresponding to the source text, currently
>> mapping non-basic characters to UCNs, so that the template parser can
>> assume all characters are members of the basic source character set:
>>
>> See [lex.ext] 5.13.8p3/4
>>
>> By no longer mapping to UCNs, we break any UDL parsers that work with
>> UCNs today. I don’t know how many there are in production, possibly zero,
>> but it is a risk to address, and provide an entry in compatibility Annex C.
>
> UCNs may only be introduced for characters not in the basic source
> character set. Could please point out which of the characters allowed
> in a user-defined-integer-literal or user-defined-floating-point-literal
> are not in the basic source character set?

I don’t find the part of the spec that restricts the contents of the token
being passed to a numeric literal operator contain some restricted
subset of characters that are meaningful to existing parses built into
the language - only that the eventual result must be either an appropriate
integeral or floating point type.

While I have no examples of users doing this in the wild, I see nothing
in the current spec that forbids such things. - for example base36 literals
will meaningfully parse all 26 letters in addition to the 10 digits - why can
this not be extended (other than common sense) to use extended
characters that map to UCNs in phase 1?

AlisdairM



SG16 list run by sg16-owner@lists.isocpp.org