ISOCPP sg16 List: Re: [isocpp-sg16] Draft of Annex E rebased on Unicode 15.1

From: Robin Leroy <eggrobin_at_[hidden]>
Date: Sat, 7 Jun 2025 15:30:47 +0200

Le mer. 4 juin 2025 à 08:26, Jens Maurer <jens.maurer_at_[hidden]> a écrit :

> The previous version of the paper appeared to do the minimum necessary to
> reflect
> Unicode 16 and kept all the detailed "we don't conform" statements in.
> Now, the ask
> was to rebase on 15.1. What is the rationale for changing the approach
> towards the
> "we don't conform" utterances for such a rather trivial rebase?
>
That was my suggestion.

The reason is twofold:

   1. We keep adding requirements there, so this is a maintenance burden as
   the version of Unicode keeps getting bumped up.
   In this instance, we added R3c, which is not part of R3. (I would say
   that the same problem applies to a lesser extent to the
   [uaxid.nonobservance] subclause, which is, in fact, missing R3c.)
   2. The νεῶν κατάλογος of requirements not met is neither required of a
   conformance statement, nor useful to an implementer.
   This is an informative annex (in the standardizing sense), so we can try
   to make it informative (in the general sense).
   If I am trying to write a tool that interoperates with various
   implementations that conform to UAX #31 (in Unicode terminology; here the
   C++ standard itself is one such « implementation »), say because I am
   writing something that generates C++ code and code in other programming
   languages, it is useful to know that I can produce default identifiers,
   just like I can in Java or Python. It is a lot less useful to read
   paragraph that summarizes what R3 is about, only to find that I cannot do
   anything with it anyway. The less said about hashtags the better.

   From that standpoint, as I had noted to Steve earlier, claiming
   conformance to R4 may be technically correct—the best kind of correct—*on
   the space of well-formed programs*, but I find it misleading at best. I
   would expect that if I am interoperating with implementations that conform
   to R4, I need to check that I don’t rely on equivalent identifiers
   (compatibility or canonical, depending on the normalization form used in
   R4), but that I don’t need to normalize the identifiers I produce, because
   the normalization-equivalent identifiers are treated as equivalent.

   Thinking about it some more, since R1 does not mention the restriction
   from [lex.name] §1 <https://eel.is/c++draft/lex.name#1>, I now think the
   conformance claim to R4 is actually just incorrect: Non-normalized
   identifiers are identifiers (both by definition in lex.name
   <https://eel.is/c++draft/lex.name#nt:identifier> and for the purposes of
   the conformance claim to R1 as worded), but they are not « treated
   equivalently by the implementation », since they cause the program to be
   ill-formed, whereas their normalized counterparts do not.

Best regards,

Robin Leroy

Received on 2025-06-07 13:31:23