Subject: Re: Is it an error to encounter a character without a valid UCN?
From: Corentin Jabot (corentinjabot_at_[hidden])
Date: 2020-06-03 12:29:32
On Wed, Jun 3, 2020, 19:23 Jens Maurer via SG16 <sg16_at_[hidden]>
> On 03/06/2020 03.38, Hubert Tong via SG16 wrote:
> > I'm not sure where we are expecting this diagnostic to come into play.
> If a vendor is dealing with an encoding that has such characters and it is
> both the source and assumed execution character set, then I doubt they are
> interested in telling their users that their strings have been outlawed by
> the committee.
> I'm reading [lex.phases] p1.1 as uttering an implied
> assumption that any character not in the basic source
> character set can be represented as a UCN.
> In particular, it seems to prescribe that the implementation
> translate any character (except those from the basic source
> character set) to UCN. If a valid UCN doesn't exist for that
> character, presumably you can't translate that program.
> Note that "valid UCN" means "value between 0 and 0x10ffff except
> surrogate code points", but does not mean "has an assignment in
> Unicode", so an implementation could use values from 0x10ffff
> downward (hopefully unassigned by Unicode) for their special
> characters. Yup, seems that would work:
> So, there is a distinction between "Unicode character set"
> and "representable as UCN", and the latter appears to offer
> enough of an escape hatch to make non-Unicode environments
> happy. We shouldn't needlessly stomp on that happiness.
As Hubert pointed out, the pua may be used by the program.
That an implementation currently can doesn't mean it should use the pua.
It is a lot more reasonable to leave it free from programs to use.
Many users such as Bloomberg have a use for it.
There is nothing being stomped on here.
> SG16 mailing list
SG16 list run by firstname.lastname@example.org