On 1/26/22 11:13 AM, Jens Maurer via SG16 wrote:

On 26/01/2022 07.51, Reini Urban wrote:

On Tue, Jan 25, 2022 at 7:38 PM Jens Maurer via SG16 <sg16@lists.isocpp.org <mailto:sg16@lists.isocpp.org>> wrote:

    On 25/01/2022 17.13, Tom Honermann via SG16 wrote:
    > On 1/25/22 3:13 AM, Corentin Jabot via SG16 wrote:
    > The standard could (I think) also provide normative encouragement to implementors to emit a diagnostic for identifiers that are not inline with TR39 guidance. I'm not sure if we already have examples of encouragement for additional diagnostics elsewhere.

    I'm not sure SG16 is the right place to discuss such fundamental matters.

    For example, some people like to compile their code with -Werror, and
    thus a recommended warning that they cannot possibly avoid (because e.g.
    it is inevitably caused by a third-party library) is indistinguishable
    from "ill-formed" for them.


true. but it's still a security issue, not just a style issue. security concerns should be handled upfront, else they leak in.
esp. potential insecure third-party libraries.

I think it's still up for debate whether that's an issue that fits
the scope of the C++ standard.

    Back to the paper at hand: Its unit of consideration is the
    "translation unit", which might be formed from header files
    from various third-party sources.  In a world permeated by
    Unicode, it seems very reasonable that each third party would
    choose their own script for e.g. identifiers of local
    variables in inline functions, yet that inevitably would
    cause conflicts under Reini's suggestion.


nope, I made 3 suggestions for the "context" in which to check, not only the translation unit.

1. before-cpp (really in-cpp)
2. private (lexical contexts)
3. after-cpp (all in one)

before-cpp would do the checks in cpp, each header file in its own context, with its own scripts,
with only the resulting ABI causing potential conflicts, but easiest to check and understand for the user.

after-cpp (i.e. in the compiler lexer, not in the cpp lexer) with the strictest and most secure implementation.

I'm sorry, but I have trouble mapping the "context" descriptions
in https://rurban.github.io/libu8ident/doc/P2528R0.html to concepts
described in the C++ standard.

Maybe you could rephrase this in terms of grammar non-terminals used
in the C++ standard?

For example, "before-cpp" might mean that the check is on pp-tokens
with script conflicts checked per-source file, i.e. after translation
phase 3 [lex.phases].

If that's meant, some pp-tokens constribute to conflicts, even though they
might end up as strings due to preprocessor stringizing [cpp.stringize],
where no TR39 restrictions should apply (I thought).

I have no clue what "private" means.

It seems that "after-cpp" means that phase 7 identifiers are subject to
checking, but I don't know how that interacts with reachable identifiers
in modules (which are, by definition, in another translation unit).

also encouraging headers to go with their own unicode identifiers is the totally wrong way to me. nobody is using them, thanksfully.
it was a false start from the beginning, and we should not encourage them even more.

The C++ standard has allowed Unicode characters in identifiers ever
since C++98.  The restrictions at the edges have shifted a bit, but
if you used non-fringe characters in your native language to write
a C++98 program, that program has not become ill-formed with C++20
just because of your use of native characters in identifiers.
That's part of the backward compatibility story.

 it only leads to balkanization and diversion, as taken literally from the CIA sabotage field manual. only very few developers

I fail to comprehend what sabotage (i.e. intentionally causing disruption)
has to do with a technical discussion on permissible C++ identifier
spellings.

can identify all the scripts. did you see a mixed-language wikipedia encouraging all languages at once? there's none,
only the respective islands.

    I don't think it's a good use of SG16's time to discuss this
    paper until these concerns are addressed.


these were addressed, since you added these concerns.

Ah, thanks for letting us know.

Neither the "Changes" section in https://rurban.github.io/libu8ident/doc/P2528R0.html
nor the announcement of that link contained any hint.

I can't read stuff afresh over and over again; not enough time.
Please make sure to produce meaningful change logs.

The "Changes" section is expectedly blank given that this is an R0 paper. It looks like Reini has started on a R1 revision. Reini, please update the document header in the R1 revision to state "D2528R1" (change the "P" to a "D" and fix the paper number) until the paper is actually frozen for submission to WG21. Feel free to update it as often as you like (maintaining the "Changes" section) until it is actually submitted. Please note that the paper number is incorrect in the title as well. Finally, please add SG16 to the target audience.

Tom.


Jens