C++ Logo

SG16

Advanced search

Subject: Re: Is it an error to encounter a character without a valid UCN?
From: Tom Honermann (tom_at_[hidden])
Date: 2020-06-02 21:55:53


On 6/2/20 7:34 AM, Alisdair Meredith via SG16 wrote:
> Translation phase 1 maps source code to either a member of the
> basic character set, or a UCN corresponding to that character.
> What if there is no such UCN? Is that undefined behavior, or is
> the program ill-formed? I can find nothing on this in [lex.phases]
> where we describe processing the source through an implemetation
> defined character mapping.
>
> When we get to [lex.charset] we can see it is clearly ill-formed if
> the produced UCN is invalid - is that supposed to be the resolution
> here? Source must always map to a UCN, but the UCN need not
> be valid, so we get an error parsing the (implied) UCN in a later
> phase?

This seems worthy of a core issue.  I did some searching to see if one
might already exist, but did not find one.  I was a little surprised to
find just 4 active core issues that mention [lex.phases] (they are 1901,
1335, 578, and 1698) and 4 that mention [lex.charset] (they are 1332,
1655, 578 (again), and 1403).  (All of these issues look like all kinds
of fun!).

Tom.

>
> AlisdairM


SG16 list run by sg16-owner@lists.isocpp.org