Date: Mon, 19 Apr 2021 22:40:53 +0200
On 19/04/2021 21.14, Jens Gustedt via Liaison wrote:
> Tom,
>
> on Mon, 19 Apr 2021 11:14:10 -0400 you (Tom Honermann
> <tom_at_[hidden]>) wrote:
>
>> That is technically correct, but surveys have (so far) failed to
>> identify any implementations that do not use UTF-16 and UTF-32 for
>> `u` and `U` prefixed literals respectively (and the intent in the
>> original papers was clear that these are intended for UTF-16 and
>> UTF-32 respectively). SG16 does intend to bring a paper to WG14 to
>> provide this guarantee as well. In fact, my records
>> <https://github.com/sg16-unicode/sg16/issues/54> state that JeanHeyd
>> already submitted such a paper, but I don't see it in the WG14
>> document log.
>
> it is excellent news that we will soon see such a paper, thanks to
> JeanHeyd for doing this
>
> In all of this, is there any chance that someday we let `\u2264` or
> similar non-identifier universal characters survive tokenization (as a
> single token) and leave it to phase 7 to decide if they can do
> something with it (or not)?
Do you have any particular use-case in mind for such?
Currently, lone UCNs that aren't part of the other main token
categories are just garbage and thus (eventually) ill-formed,
I believe.
Jens
> Tom,
>
> on Mon, 19 Apr 2021 11:14:10 -0400 you (Tom Honermann
> <tom_at_[hidden]>) wrote:
>
>> That is technically correct, but surveys have (so far) failed to
>> identify any implementations that do not use UTF-16 and UTF-32 for
>> `u` and `U` prefixed literals respectively (and the intent in the
>> original papers was clear that these are intended for UTF-16 and
>> UTF-32 respectively). SG16 does intend to bring a paper to WG14 to
>> provide this guarantee as well. In fact, my records
>> <https://github.com/sg16-unicode/sg16/issues/54> state that JeanHeyd
>> already submitted such a paper, but I don't see it in the WG14
>> document log.
>
> it is excellent news that we will soon see such a paper, thanks to
> JeanHeyd for doing this
>
> In all of this, is there any chance that someday we let `\u2264` or
> similar non-identifier universal characters survive tokenization (as a
> single token) and leave it to phase 7 to decide if they can do
> something with it (or not)?
Do you have any particular use-case in mind for such?
Currently, lone UCNs that aren't part of the other main token
categories are just garbage and thus (eventually) ill-formed,
I believe.
Jens
Received on 2021-04-19 15:41:02