ISOCPP sg16 List: Re: [isocpp-ssrg] Fwd: Avoiding Source Code Spoofing

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 3 Mar 2022 14:33:11 -0500

Thank you, JF. Copying SG16 and the author of P2538R0 (C++ Identifier
Security using Unicode Standard Annex 39) <https://wg21.link/p2528r0>.

This effort has implications for future work on P2538R0; we'll want to
ensure that any such efforts are consistent with the guidance and work
eventually produced by this new Unicode Consortium group.

Tom.

On 3/2/22 6:32 PM, JF Bastien via Ssrg wrote:
> Related to the recent paper discussion.
>
> ---------- Forwarded message ---------
> From: *announcements via announcements* <announcements_at_[hidden]>
> Date: Thu, Mar 3, 2022 at 8:03 AM
> Subject: Avoiding Source Code Spoofing
> To: <announcements_at_[hidden]>
> CC: announcements <announcements_at_[hidden]>
>
>
> [board image]
> <https://www.unicode.org/announcements/board-unsplash-200px.jpg>Unicode
> has convened a group of experts in programming languages, tooling, and
> security to provide guidance and recommendations on how to better
> handle international text in source code, as well as providing code to
> help implementations.
>
> Recent reports have highlighted problems in the review of source code
> containing non-ASCII Unicode characters (the so-called “Trojan Source
> exploit”). A person reviewing a submission of source code could be
> fooled into thinking that the code was okay, when it was actually
> malicious. The basic problem occurs when the actual text is different
> from what the reader perceives it to be, based on what is displayed.
> This can result either from the presence of characters used in
> right-to-left scripts (such as Arabic or Hebrew) that can change the
> visual ordering of text, or from the presence of characters that look
> like others (also known as “confusables”).
>
> The problems here are not solely a security issue: text with different
> writing directions or confusable characters can be hard to work with.
> Finding a solution here is important from both security and usability
> points of view. Developers of source code editors or compilers should
> not be required to have a deep knowledge of Unicode to provide good
> user experience and robust security mitigations.
>
> Unicode’s mission is to allow everyone to use their own languages on
> computers and mobile devices. The above issues are part and parcel of
> a character set that covers all the writing systems of the world – and
> have been documented in the Unicode Standard since its very first
> version in 1991. Unicode’s past efforts have focused on misleading
> URLs and identifiers, and correct visual ordering of plain text. And
> while much of this material is relevant to source code, this group of
> experts will now collect, curate, and supplement that early
> documentation with concrete recommendations to support source code
> editors and compilers.
>
> While it may seem that it is easiest to simply go back to limiting
> source code to only ASCII characters, ASCII-only environments make it
> much harder to write and maintain software that can be used all over
> the world – a fundamental requirement for modern software. Moreover,
> this approach disadvantages software developers who use languages
> other than English.
>
> More details on the source code spoofing issue, the proposed plan, and
> formation of this group are found in document L2/22-007R2
> <https://www.unicode.org/L2/L2022/22007r2-avoiding-spoof.pdf>.
>
> ------------------------------------------------------------------------
> /Over 144,000 characters are available for adoption
> <https://www.unicode.org/consortium/adopt-a-character.html> to help
> the Unicode Consortium’s work on digitally disadvantaged languages/
>
> [badge] <https://www.unicode.org/consortium/adopt-a-character.html>
>
> http://blog.unicode.org/2022/03/avoiding-source-code-spoofing.html
>
>

Received on 2022-03-03 19:33:17