C++ Logo

sg16

Advanced search

Re: Concerning - C/C++ Identifier Security using Unicode Standard Annex 39

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Wed, 19 Jan 2022 13:40:02 +0100
On 19/01/2022 11.40, Corentin via SG16 wrote:
> Concerning your request here
> https://github.com/sg16-unicode/sg16/issues/48#issuecomment-1014698519 <https://github.com/sg16-unicode/sg16/issues/48#issuecomment-1014698519>
>
> For WG21, please follow the procedure here
> https://isocpp.org/std/standing-documents/sd-7-mailing-procedures-and-how-to-write-papers <https://isocpp.org/std/standing-documents/sd-7-mailing-procedures-and-how-to-write-papers>
>
> Adding Aaron Ballman who can help you with the WG14 side of things.

You forgot to actually copy Aaron. I've done so now.

> ---
> It might be best to synchronize with SG16 (unicode study group) as suggested by Tom before going to the C committee though.
> Note that C++ already accepted P1949 and is reaching design freeze - but your paper can be considered a bug fix if accepted.

My personal opinion here is that the paper constitutes a substantial change
in direction for Unicode identifier support that would be a hard sell as a
bug fix on top of P1949.

> Great paper!

It seems that an existing program that uses header files from different
third parties would likely become ill-formed with this paper, if one
header file declares a class with private member names in one script
and another header file declares another class with private members
named using a script deemed conflicting.

It would be great if the paper could talk about that situation, and
also have a look at whether modules provide sufficient isolation to
overcome the issue (I suspect not).

> However, I'm afraid security considerations are a moving target and keeping them as warnings/QoI avoid a huge burden on the committee(s) while granting more flexibility to implementers.
> In particular the set allowed/disallowed/recommended scripts is likely to be unstable.
> I expect more research to come out of the unicode consortium in the coming months in regard to security/confusable/bidi attacks.

I'd also like to point out that qualitatively differentiating entire
scripts into allowed/disallowed/recommended buckets might be construed
as discriminatory against the people using scripts disfavored by
that categorization.

> A big selling point of UAX31/P1949 is that it avoids a huge burden on 2 slow moving standard bodies (who have limited Unicode expertise and even less expertise with the impact on specific affected scripts).

Yes, and (to the best of our knowledge) UAX31 does not exclude any active
script in its entirety (some characters might need work-arounds, similar to
the fact that an English hyphen cannot appear in a C++ identifier).

Jens

Received on 2022-01-19 12:40:19