sg16: Re: [SG16] Wording for UAX #31 identifiers

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Thu, 9 Apr 2020 10:19:43 +0200

On Thu, 9 Apr 2020 at 08:15, Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

>
> See attached.
>
> Note that a universal-character-name never
> represents a member of the basic source
> character set, so we don't have to call out
> underscores specifically.
>
> This makes any sequence involving a universal-character-name
> a pp-identifier (and thus a preprocessing-token), so that
>
> #define accent(x) x ## \u0300
>
> does the right thing.
>
> Did someone check that UAX #31 really is part of ISO 10646?
> There should be a cross-reference to the Annex somewhere in [lex.name].
>
> Further concerns:
>
> We have a generic reference to ISO 10646 in the front matter
> of the standard. That means the most recent version applies,
> implicitly. That's a bit of a moving target, though: Does
> an implementation lose conformance if a new version of ISO 10646
> is issued (because more characters are allowed in identifiers in
> later versions, maybe)?
>
> Should we maybe require an implementation to document which
> revision of ISO 10646 was used for XID_Start and XID_Continue?
> This way, programmers can at least find out about a
> portability pitfall.
>

C++23 programs should remain portable across compilers, therefore, in this
case, it seems
necessary to specify a specific version, (and compilers can support newer
versions as an extensions)

> The paper should spend a section on explaining how expensive
> (code size; maybe performance) a check for NFC is for the compiler.
> Does the compiler need the entire Unicode tables, or are there
> shortcuts (e.g. a few ranges of "bad" code points)?
>

I think both Zach and me should be able to provide data for that

>
> Jens
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-04-09 03:23:12