Date: Sat, 6 Jan 2024 15:05:39 +0100
On 06/01/2024 11.56, Corentin via SG16 wrote:
> ZWJ, and variations selectors seem to have been added to XID_Continueas of Unicode 15.1,
Uh, interesting.
> despite UAX31 recognizing that "The use of default-ignorable characters in identifiers is problematic".
https://unicode.org/reports/tr31/#Acknowledgments
says
"Acknowledgments
The attendees of the Source Code Working Group meetings assisted with the substantial changes made in Versions 15.0 and 15.1:
... Tom Honermann, ..., Robin Leroy, ..., Richard Smith."
These names sound familiar to me, and at least two of them are
probably on this e-mail reflector.
Could one of them say something about what happened?
> The solution to that is to define that we implement the Default-Ignorable Exclusion Profile <https://unicode.org/reports/tr31/#Default_Ignorable_Exclusion_Profile>,
> which I believe restores the Unicode 15 / C++23/ SG16 consensus on Identifier grammar behavior.
The whole point of deferring to Unicode for the details of identifier
characters was to never deal with any details again at the C++ level.
So, if Unicode decided that ZWJ and ZWNJ are now fine in identifiers,
why are we suddenly deciding to have second thoughts on that?
> It is very surprising to me that ZWJ is opt-out rather than opt-in given the security implications and the fact supporting them requires implementation of TR39 3.1.1 <https://www.unicode.org/reports/tr39/#Joining_Controls>.
Yes. In particular since the previous approach was opt-in, according to our Annex E.2.2.
Turning around like that in a minor revision of a standard is a bit discomforting.
Oh, and our Annex E needs updating for the new revision of UAX#31.
Jens
> I suppose this was done to better support Sanskrit?
>
> Anyway, it might be useful to either:
> - Mandate the Default-Ignorable Exclusion Profile
> - Mandate some form of TR39 conformance (wich is going to be quite the burden for implementers given our bandwidth)
> - Or allow/recommend either of the above options.
>
> Thanks,
>
> Corentin
>
> ZWJ, and variations selectors seem to have been added to XID_Continueas of Unicode 15.1,
Uh, interesting.
> despite UAX31 recognizing that "The use of default-ignorable characters in identifiers is problematic".
https://unicode.org/reports/tr31/#Acknowledgments
says
"Acknowledgments
The attendees of the Source Code Working Group meetings assisted with the substantial changes made in Versions 15.0 and 15.1:
... Tom Honermann, ..., Robin Leroy, ..., Richard Smith."
These names sound familiar to me, and at least two of them are
probably on this e-mail reflector.
Could one of them say something about what happened?
> The solution to that is to define that we implement the Default-Ignorable Exclusion Profile <https://unicode.org/reports/tr31/#Default_Ignorable_Exclusion_Profile>,
> which I believe restores the Unicode 15 / C++23/ SG16 consensus on Identifier grammar behavior.
The whole point of deferring to Unicode for the details of identifier
characters was to never deal with any details again at the C++ level.
So, if Unicode decided that ZWJ and ZWNJ are now fine in identifiers,
why are we suddenly deciding to have second thoughts on that?
> It is very surprising to me that ZWJ is opt-out rather than opt-in given the security implications and the fact supporting them requires implementation of TR39 3.1.1 <https://www.unicode.org/reports/tr39/#Joining_Controls>.
Yes. In particular since the previous approach was opt-in, according to our Annex E.2.2.
Turning around like that in a minor revision of a standard is a bit discomforting.
Oh, and our Annex E needs updating for the new revision of UAX#31.
Jens
> I suppose this was done to better support Sanskrit?
>
> Anyway, it might be useful to either:
> - Mandate the Default-Ignorable Exclusion Profile
> - Mandate some form of TR39 conformance (wich is going to be quite the burden for implementers given our bandwidth)
> - Or allow/recommend either of the above options.
>
> Thanks,
>
> Corentin
>
Received on 2024-01-06 14:05:44