C++ Logo

sg16

Advanced search

Re: [SG16] Redefining Lexing in terms of Unicode

From: Steve Downey <sdowney_at_[hidden]>
Date: Thu, 28 May 2020 10:54:15 -0400
On Thu, May 28, 2020 at 9:49 AM Hubert Tong via SG16 <sg16_at_[hidden]>
wrote:

> On Thu, May 28, 2020 at 4:04 AM Corentin via SG16 <sg16_at_[hidden]>
> wrote:
>
>>
>>
>> - Source character set is redefined as being the Unicode character set
>>
>> It seems like we're encouraging homoglyph issues. Do we expect open
> source projects to maintain coding guidelines that restrict characters
> outside the ASCII range?
>

That ship has sailed:
// In many fonts the Greek letter 'Α', the Cyrillic letter 'А' and the
Latin letter 'A' are visually identical,
// as are the Latin letter 'a' and the Cyrillic letter 'а'
int Α = 0;
int А = 1;
int A = 2;
int a = 3;
int а = 4;
compiles out of the box with current gcc, and clang since version 3.x. This
is new behavior for gcc. MSVC does not permit it with no flags, but does
with /utf-8.

 https://godbolt.org/z/WAMqrq

It's one of the reasons addressing identifiers is somewhat urgent, even
though we're only addressing UAX 31, not 36
http://unicode.org/reports/tr36/ UNICODE
SECURITY CONSIDERATIONS. The implementation costs for 36 are very high,
though.

Received on 2020-05-28 09:57:33