It is very surprising to me that ZWJ is opt-out rather than opt-in given the security implications and the fact supporting them requires implementation of
TR39 3.1.1.
I suppose this was done to better support Sanskrit?
This is because ZWJ and ZWNJ are needed orthographically in modern languages (Persian being one of them, see the example in
Section 5.1.3 of UTS #55), and because this does not actually change the picture as far as security implications are concerned: the
260 variation selectors have always been allowed (and
must always remain allowed), and those even more rarely have a visible effect. Besides, the security issue of visually indistinguishable identifiers is not usefully mitigated by prohibiting invisible characters: you still have good old confusables, e.g., НТТР vs. HTTP, and bidi confusables, aא1 vs. a1א.
Anyway, it might be useful to either:
- Mandate the Default-Ignorable Exclusion Profile
- Mandate some form of TR39 conformance (wich is going to be quite the burden for implementers given our bandwidth)
I would not recommend mandating either of these things.
Mostly the gist of that document is that legality according to a language specification, which requires some level of stability, is the wrong tool to mitigate the security issues arising from invisible characters in identifiers (and more broadly with issues arising from visually indistinguishable identifiers).
Instead we encourage implementations to deal with that using linters, compiler warnings, and various editor squiggles (which are important things, but not within the scope of what the standard mandates).
Note that other diagnostics are recommended by UTS #55 that affect default ignorable code points in general and ZWJ and ZWNJ in particular, but many of them really mostly address usability rather than security issues.
Best regards,
Robin Leroy