C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] Draft of Annex E rebased on Unicode 15.1 - D3727R0

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Sat, 7 Jun 2025 21:20:45 +0200
On 07.06.25 15:30, Robin Leroy wrote:
> Le mer. 4 juin 2025 à 08:26, Jens Maurer <jens.maurer_at_[hidden] <mailto:jens.maurer_at_[hidden]>> a écrit :
>
> The previous version of the paper appeared to do the minimum necessary to reflect
> Unicode 16 and kept all the detailed "we don't conform" statements in. Now, the ask
> was to rebase on 15.1. What is the rationale for changing the approach towards the
> "we don't conform" utterances for such a rather trivial rebase?
>
> That was my suggestion.
>
> The reason is twofold:
>
> 1. We keep adding requirements there, so this is a maintenance burden as the version of Unicode keeps getting bumped up.
> In this instance, we added R3c, which is not part of R3. (I would say that the same problem applies to a lesser extent to the [uaxid.nonobservance] subclause, which is, in fact, missing R3c.)
> 2. The νεῶν κατάλογος of requirements not met is neither required of a conformance statement, nor useful to an implementer.
> This is an informative annex (in the standardizing sense), so we can try to make it informative (in the general sense).
> If I am trying to write a tool that interoperates with various implementations that conform to UAX #31 (in Unicode terminology; here the C++ standard itself is one such « implementation »), say because I am writing something that generates C++ code and code in other programming languages, it is useful to know that I can produce default identifiers, just like I can in Java or Python. It is a lot less useful to read paragraph that summarizes what R3 is about, only to find that I cannot do anything with it anyway. The less said about hashtags the better.

Ok, that changes the thrust of Annex E: It's now just a list of positive
statements, and all the cruft irrelevant for C++ is simply left out.

> From that standpoint, as I had noted to Steve earlier, claiming conformance to R4 may be technically correct—the best kind of correct—/on the space of well-formed programs/, but I find it misleading at best. I would expect that if I am interoperating with implementations that conform to R4, I need to check that I don’t rely on equivalent identifiers (compatibility or canonical, depending on the normalization form used in R4), but that I don’t need to normalize the identifiers I produce, because the normalization-equivalent identifiers are treated as equivalent.

> Thinking about it some more, since R1 does not mention the restriction from [lex.name] §1 <https://eel.is/c++draft/lex.name#1>, I now think the conformance claim to R4 is actually just incorrect: Non-normalized identifiers are identifiers (both by definition in lex.name <https://eel.is/c++draft/lex.name#nt:identifier> and for the purposes of the conformance claim to R1 as worded), but they are not « treated equivalently by the implementation », since they cause the program to be ill-formed, whereas their normalized counterparts do not.

I agree. Reading the context of R4 in UAX#31, it is intended that the implementation
(e.g. the C++ compiler) normalizes identifiers before doing anything with them.
In contrast, C++ requires that identifier be normalized in the input to the compiler.

So, R4 should go in our Annex E.

Jens

Received on 2025-06-07 19:20:53