Subject: Re: P2194R0 The character set of C++ source code is Unicode
From: Alisdair Meredith (alisdairm_at_[hidden])
Date: 2020-08-24 15:50:49
Nevermind - got it. The UDL grammar for integer/floating-point literals
Comes from the grammar for decimal-literal/octal-literal/etc., so we are
not free to add our own characters to this grammar, and should expand
our repertoire of characters only as the fundamental grammar expands,
such as when decimal separators and hex-floats were added.
So agreed - I am worrying about a non-issue here.
> On Aug 24, 2020, at 16:38, Alisdair Meredith via SG16 <sg16_at_[hidden]> wrote:
>> On Aug 24, 2020, at 16:23, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>> On 24/08/2020 21.44, Alisdair Meredith via SG16 wrote:
>>> Got another good corner case for you!
>>> In the template form of user defined literals, the template parameter pack
>>> is instiated with characters corresponding to the source text, currently
>>> mapping non-basic characters to UCNs, so that the template parser can
>>> assume all characters are members of the basic source character set:
>>> See [lex.ext] 5.13.8p3/4
>>> By no longer mapping to UCNs, we break any UDL parsers that work with
>>> UCNs today. I donât know how many there are in production, possibly zero,
>>> but it is a risk to address, and provide an entry in compatibility Annex C.
>> UCNs may only be introduced for characters not in the basic source
>> character set. Could please point out which of the characters allowed
>> in a user-defined-integer-literal or user-defined-floating-point-literal
>> are not in the basic source character set?
> I donât find the part of the spec that restricts the contents of the token
> being passed to a numeric literal operator contain some restricted
> subset of characters that are meaningful to existing parses built into
> the language - only that the eventual result must be either an appropriate
> integeral or floating point type.
> While I have no examples of users doing this in the wild, I see nothing
> in the current spec that forbids such things. - for example base36 literals
> will meaningfully parse all 26 letters in addition to the 10 digits - why can
> this not be extended (other than common sense) to use extended
> characters that map to UCNs in phase 1?
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16 <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
SG16 list run by firstname.lastname@example.org