C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] NL 029 : Disallow zero-width and control characters

From: Corentin <corentin.jabot_at_[hidden]>
Date: Fri, 25 Oct 2019 08:58:37 +0200
On Fri, Oct 25, 2019, 02:18 Zach Laine <whatwasthataddress_at_[hidden]> wrote:

> Is this a real problem that is biting people right now? Are people using
> these characters in identifiers and causing great upheaval? This seems of
> the lowest possible priority to me, and not at all C++20-related.
>


Completely agree, with both of you.
I would be deeply unsatisfied with a solution that would:

* Not follow TR31 recommandations
* Not address the fact that you can only have Unicode identifiers if the
compiler knows that your file id

>
> Zach
>
> On Thu, Oct 24, 2019 at 5:25 PM Steve Downey <sdowney_at_[hidden]> wrote:
>
>> SG16 has an NB comment to deal with! Tom has already scheduled it for
>> Belfast. It's basically that the list of allowed code points have some
>> interesting control characters like zero width joiners and RTL modifiers.
>>
>> https://github.com/cplusplus/nbballot/issues/28
>>
>> There's also an issue that JF raised earlier:
>> https://github.com/sg16-unicode/sg16/issues/48
>> Improve support for Unicode characters in identifiers
>>
>> Relevant unicode standard:
>> https://unicode.org/reports/tr31/ UNICODE IDENTIFIER AND PATTERN SYNTAX
>>
>> Which is complicated because it allows things like identifiers written in
>> Farsi which requires zwj for disambiguation, and suggests regex to detect
>> particular allowed identifiers. It's fairly dense, and I haven't digested
>> it yet, but it looks like there might be allowed ways to exclude that.
>>
>> Plus tailoring would be needed because C++ disallows some characters such
>> as '$' which might otherwise be allowed. This is also discussed in TR31.
>>
>>
>> My feeling on the comment is that it's not a new issue for C++20, so it's
>> not clear that it has to be fixed for C++20. I believe it should be fixed,
>> but it ought to be fixed in a principled manner, and that likely means
>> TR31.
>>
>> We would also have to discuss if emoji are allowed in identifiers. TR31
>> does not strictly disallow them. The TonyTable shall be interesting.
>>
>>
>>
>> _______________________________________________
>> SG16 Unicode mailing list
>> Unicode_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/unicode
>>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2019-10-25 08:58:52