sg16: Re: [SG16] Is it an error to encounter a character without a valid UCN?

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 3 Jun 2020 19:29:32 +0200

On Wed, Jun 3, 2020, 19:23 Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

> On 03/06/2020 03.38, Hubert Tong via SG16 wrote:
> > I'm not sure where we are expecting this diagnostic to come into play.
> If a vendor is dealing with an encoding that has such characters and it is
> both the source and assumed execution character set, then I doubt they are
> interested in telling their users that their strings have been outlawed by
> the committee.
>
> I'm reading [lex.phases] p1.1 as uttering an implied
> assumption that any character not in the basic source
> character set can be represented as a UCN.
>
> In particular, it seems to prescribe that the implementation
> translate any character (except those from the basic source
> character set) to UCN. If a valid UCN doesn't exist for that
> character, presumably you can't translate that program.
>
> Note that "valid UCN" means "value between 0 and 0x10ffff except
> surrogate code points", but does not mean "has an assignment in
> Unicode", so an implementation could use values from 0x10ffff
> downward (hopefully unassigned by Unicode) for their special
> characters. Yup, seems that would work:
> https://en.wikipedia.org/wiki/Private_Use_Areas
>
> So, there is a distinction between "Unicode character set"
> and "representable as UCN", and the latter appears to offer
> enough of an escape hatch to make non-Unicode environments
> happy. We shouldn't needlessly stomp on that happiness.
>

As Hubert pointed out, the pua may be used by the program.
That an implementation currently can doesn't mean it should use the pua.
It is a lot more reasonable to leave it free from programs to use.

Many users such as Bloomberg have a use for it.

There is nothing being stomped on here.

>
> Jens
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-06-03 12:32:52