C++ Logo


Advanced search

Re: [SG16] Is it an error to encounter a character without a valid UCN?

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 2 Jun 2020 22:55:53 -0400
On 6/2/20 7:34 AM, Alisdair Meredith via SG16 wrote:
> Translation phase 1 maps source code to either a member of the
> basic character set, or a UCN corresponding to that character.
> What if there is no such UCN? Is that undefined behavior, or is
> the program ill-formed? I can find nothing on this in [lex.phases]
> where we describe processing the source through an implemetation
> defined character mapping.
> When we get to [lex.charset] we can see it is clearly ill-formed if
> the produced UCN is invalid - is that supposed to be the resolution
> here? Source must always map to a UCN, but the UCN need not
> be valid, so we get an error parsing the (implied) UCN in a later
> phase?

This seems worthy of a core issue. I did some searching to see if one
might already exist, but did not find one. I was a little surprised to
find just 4 active core issues that mention [lex.phases] (they are 1901,
1335, 578, and 1698) and 4 that mention [lex.charset] (they are 1332,
1655, 578 (again), and 1403). (All of these issues look like all kinds
of fun!).


> AlisdairM

Received on 2020-06-02 21:59:01