C++ Logo

sg16

Advanced search

Re: UAX Profiles

From: Robin Leroy <eggrobin_at_[hidden]>
Date: Sat, 6 Jan 2024 15:56:17 +0100
Le sam. 6 janv. 2024 à 15:05, Jens Maurer via SG16 <sg16_at_[hidden]>
a écrit :

> ... Tom Honermann, ..., Robin Leroy, ..., Richard Smith."
>
> These names sound familiar to me, and at least two of them are
> probably on this e-mail reflector.
>
> Could one of them say something about what happened?

It sounds like it is time for the liaison officer to do some liaising :-)

Le sam. 6 janv. 2024 à 11:56, Corentin <corentin.jabot_at_[hidden]> a écrit :

> It is very surprising to me that ZWJ is opt-out rather than opt-in given
> the security implications and the fact supporting them requires
> implementation of TR39 3.1.1
> <https://www.unicode.org/reports/tr39/#Joining_Controls>.
> I suppose this was done to better support Sanskrit?
>
This is because ZWJ and ZWNJ are needed orthographically in modern
languages (Persian being one of them, see the example in Section 5.1.3 of
UTS #55 <https://www.unicode.org/reports/tr55/#General-Security-Profile>),
and because this does not actually change the picture as far as security
implications are concerned: the 260 variation selectors
<https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BVariation_Selector%7D&g=&i=xid_continue>
have always been allowed (and must always remain allowed
<https://www.unicode.org/policies/stability_policy.html#Identifier>), and
those even more rarely have a visible effect. Besides, the security issue
of visually indistinguishable identifiers is not usefully mitigated by
prohibiting invisible characters: you still have good old confusables,
e.g., НТТР vs. HTTP, and bidi confusables, aא1 vs. a1א.

Anyway, it might be useful to either:
> - Mandate the Default-Ignorable Exclusion Profile
> - Mandate some form of TR39 conformance (wich is going to be quite the
> burden for implementers given our bandwidth)
>
I would not recommend mandating either of these things.
It might be useful to review the new UTS #55, Unicode Source Code Handling
<https://www.unicode.org/reports/tr55/>.

Mostly the gist of that document is that legality according to a language
specification, which requires some level of stability, is the wrong tool to
mitigate the security issues arising from invisible characters in
identifiers (and more broadly with issues arising from visually
indistinguishable identifiers).
Instead we encourage implementations to deal with that using linters,
compiler warnings, and various editor squiggles (which are important
things, but not within the scope of what the standard mandates).
See https://www.unicode.org/reports/tr55/#Display.

For confusability (which deals with the issue of invisible characters among
many others), see https://www.unicode.org/reports/tr55/#Confusability.
Note that other diagnostics are recommended by UTS #55 that affect default
ignorable code points in general and ZWJ and ZWNJ in particular, but many
of them really mostly address usability rather than security issues.

Best regards,

Robin Leroy

Received on 2024-01-06 14:56:36