ISOCPP sg16 List: Re: libu8ident 0.1 released

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 26 Jan 2022 13:08:14 -0500

On 1/26/22 11:13 AM, Jens Maurer via SG16 wrote:
> On 26/01/2022 07.51, Reini Urban wrote:
>> On Tue, Jan 25, 2022 at 7:38 PM Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>>
>> On 25/01/2022 17.13, Tom Honermann via SG16 wrote:
>> > On 1/25/22 3:13 AM, Corentin Jabot via SG16 wrote:
>> > The standard could (I think) also provide normative encouragement to implementors to emit a diagnostic for identifiers that are not inline with TR39 guidance. I'm not sure if we already have examples of encouragement for additional diagnostics elsewhere.
>>
>> I'm not sure SG16 is the right place to discuss such fundamental matters.
>>
>> For example, some people like to compile their code with -Werror, and
>> thus a recommended warning that they cannot possibly avoid (because e.g.
>> it is inevitably caused by a third-party library) is indistinguishable
>> from "ill-formed" for them.
>>
>>
>> true. but it's still a security issue, not just a style issue. security concerns should be handled upfront, else they leak in.
>> esp. potential insecure third-party libraries.
> I think it's still up for debate whether that's an issue that fits
> the scope of the C++ standard.
>
>> Back to the paper at hand: Its unit of consideration is the
>> "translation unit", which might be formed from header files
>> from various third-party sources. In a world permeated by
>> Unicode, it seems very reasonable that each third party would
>> choose their own script for e.g. identifiers of local
>> variables in inline functions, yet that inevitably would
>> cause conflicts under Reini's suggestion.
>>
>>
>> nope, I made 3 suggestions for the "context" in which to check, not only the translation unit.
>>
>> 1. before-cpp (really in-cpp)
>> 2. private (lexical contexts)
>> 3. after-cpp (all in one)
>>
>> before-cpp would do the checks in cpp, each header file in its own context, with its own scripts,
>> with only the resulting ABI causing potential conflicts, but easiest to check and understand for the user.
>>
>> after-cpp (i.e. in the compiler lexer, not in the cpp lexer) with the strictest and most secure implementation.
> I'm sorry, but I have trouble mapping the "context" descriptions
> in https://rurban.github.io/libu8ident/doc/P2528R0.html to concepts
> described in the C++ standard.
>
> Maybe you could rephrase this in terms of grammar non-terminals used
> in the C++ standard?
>
> For example, "before-cpp" might mean that the check is on pp-tokens
> with script conflicts checked per-source file, i.e. after translation
> phase 3 [lex.phases].
>
> If that's meant, some pp-tokens constribute to conflicts, even though they
> might end up as strings due to preprocessor stringizing [cpp.stringize],
> where no TR39 restrictions should apply (I thought).
>
> I have no clue what "private" means.
>
> It seems that "after-cpp" means that phase 7 identifiers are subject to
> checking, but I don't know how that interacts with reachable identifiers
> in modules (which are, by definition, in another translation unit).
>
>> also encouraging headers to go with their own unicode identifiers is the totally wrong way to me. nobody is using them, thanksfully.
>> it was a false start from the beginning, and we should not encourage them even more.
> The C++ standard has allowed Unicode characters in identifiers ever
> since C++98. The restrictions at the edges have shifted a bit, but
> if you used non-fringe characters in your native language to write
> a C++98 program, that program has not become ill-formed with C++20
> just because of your use of native characters in identifiers.
> That's part of the backward compatibility story.
>
>> it only leads to balkanization and diversion, as taken literally from the CIA sabotage field manual. only very few developers
> I fail to comprehend what sabotage (i.e. intentionally causing disruption)
> has to do with a technical discussion on permissible C++ identifier
> spellings.
>
>> can identify all the scripts. did you see a mixed-language wikipedia encouraging all languages at once? there's none,
>> only the respective islands.
>> I don't think it's a good use of SG16's time to discuss this
>> paper until these concerns are addressed.
>>
>>
>> these were addressed, since you added these concerns.
> Ah, thanks for letting us know.
>
> Neither the "Changes" section in https://rurban.github.io/libu8ident/doc/P2528R0.html
> nor the announcement of that link contained any hint.
>
> I can't read stuff afresh over and over again; not enough time.
> Please make sure to produce meaningful change logs.

The "Changes" section is expectedly blank given that this is an R0
paper. It looks like Reini has started on a R1 revision
<https://rurban.github.io/libu8ident/doc/P2528R1.html>. Reini, please
update the document header in the R1 revision to state "D2528R1" (change
the "P" to a "D" and fix the paper number) until the paper is actually
frozen for submission to WG21. Feel free to update it as often as you
like (maintaining the "Changes" section) until it is actually submitted.
Please note that the paper number is incorrect in the title as well.
Finally, please add SG16 to the target audience.

Tom.

>
> Jens

Received on 2022-01-26 18:08:15