C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] NL 029 : Disallow zero-width and control characters

From: Corentin <corentin.jabot_at_[hidden]>
Date: Fri, 25 Oct 2019 09:09:56 +0200
On Fri, 25 Oct 2019 at 08:58, Corentin <corentin.jabot_at_[hidden]> wrote:

>
>
> On Fri, Oct 25, 2019, 02:18 Zach Laine <whatwasthataddress_at_[hidden]>
> wrote:
>
>> Is this a real problem that is biting people right now? Are people using
>> these characters in identifiers and causing great upheaval? This seems of
>> the lowest possible priority to me, and not at all C++20-related.
>>
>
>
> Completely agree, with both of you.
> I would be deeply unsatisfied with a solution that would:
>
> * Not follow TR31 recommandations
> * Not address the fact that you can only have Unicode identifiers if the
> compiler knows that your file id
>

I sent the previous mail too fast, sorry about the noise.
As I was saying

Completely agree, with both of you.
I would be deeply unsatisfied with a solution that would:

* Not follow TR31 recommendations
* Not address the fact that you can only have Unicode identifiers if the
compiler knows that your file is UTF encoded (same issue that u8 literals,
we talked about that - P1880)
* Fail to address concerns related to mangling if a normalization form is
not specified
* Fail to recognize that we will want to reflect on the name of these
things (std::meta::name_of) and that would require reflection to be able to
deal with that, both in terms of providing a uf8 api AND a specified
normalization form

All of that require careful consideration


Corentin




>
>> Zach
>>
>> On Thu, Oct 24, 2019 at 5:25 PM Steve Downey <sdowney_at_[hidden]> wrote:
>>
>>> SG16 has an NB comment to deal with! Tom has already scheduled it for
>>> Belfast. It's basically that the list of allowed code points have some
>>> interesting control characters like zero width joiners and RTL modifiers.
>>>
>>> https://github.com/cplusplus/nbballot/issues/28
>>>
>>> There's also an issue that JF raised earlier:
>>> https://github.com/sg16-unicode/sg16/issues/48
>>> Improve support for Unicode characters in identifiers
>>>
>>> Relevant unicode standard:
>>> https://unicode.org/reports/tr31/ UNICODE IDENTIFIER AND PATTERN SYNTAX
>>>
>>>
>>> Which is complicated because it allows things like identifiers written
>>> in Farsi which requires zwj for disambiguation, and suggests regex to
>>> detect particular allowed identifiers. It's fairly dense, and I haven't
>>> digested it yet, but it looks like there might be allowed ways to exclude
>>> that.
>>>
>>> Plus tailoring would be needed because C++ disallows some characters
>>> such as '$' which might otherwise be allowed. This is also discussed in
>>> TR31.
>>>
>>>
>>> My feeling on the comment is that it's not a new issue for C++20, so
>>> it's not clear that it has to be fixed for C++20. I believe it should be
>>> fixed, but it ought to be fixed in a principled manner, and that likely
>>> means TR31.
>>>
>>> We would also have to discuss if emoji are allowed in identifiers. TR31
>>> does not strictly disallow them. The TonyTable shall be interesting.
>>>
>>>
>>>
>>> _______________________________________________
>>> SG16 Unicode mailing list
>>> Unicode_at_[hidden]
>>> http://www.open-std.org/mailman/listinfo/unicode
>>>
>> _______________________________________________
>> SG16 Unicode mailing list
>> Unicode_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/unicode
>>
>

Received on 2019-10-25 09:10:09