Date: Wed, 26 Jan 2022 17:13:02 +0100
On 26/01/2022 07.51, Reini Urban wrote:
>
> On Tue, Jan 25, 2022 at 7:38 PM Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> On 25/01/2022 17.13, Tom Honermann via SG16 wrote:
> > On 1/25/22 3:13 AM, Corentin Jabot via SG16 wrote:
> > The standard could (I think) also provide normative encouragement to implementors to emit a diagnostic for identifiers that are not inline with TR39 guidance. I'm not sure if we already have examples of encouragement for additional diagnostics elsewhere.
>
> I'm not sure SG16 is the right place to discuss such fundamental matters.
>
> For example, some people like to compile their code with -Werror, and
> thus a recommended warning that they cannot possibly avoid (because e.g.
> it is inevitably caused by a third-party library) is indistinguishable
> from "ill-formed" for them.
>
>
> true. but it's still a security issue, not just a style issue. security concerns should be handled upfront, else they leak in.
> esp. potential insecure third-party libraries.
I think it's still up for debate whether that's an issue that fits
the scope of the C++ standard.
> Back to the paper at hand: Its unit of consideration is the
> "translation unit", which might be formed from header files
> from various third-party sources. In a world permeated by
> Unicode, it seems very reasonable that each third party would
> choose their own script for e.g. identifiers of local
> variables in inline functions, yet that inevitably would
> cause conflicts under Reini's suggestion.
>
>
> nope, I made 3 suggestions for the "context" in which to check, not only the translation unit.
>
> 1. before-cpp (really in-cpp)
> 2. private (lexical contexts)
> 3. after-cpp (all in one)
>
> before-cpp would do the checks in cpp, each header file in its own context, with its own scripts,
> with only the resulting ABI causing potential conflicts, but easiest to check and understand for the user.
>
> after-cpp (i.e. in the compiler lexer, not in the cpp lexer) with the strictest and most secure implementation.
I'm sorry, but I have trouble mapping the "context" descriptions
in https://rurban.github.io/libu8ident/doc/P2528R0.html to concepts
described in the C++ standard.
Maybe you could rephrase this in terms of grammar non-terminals used
in the C++ standard?
For example, "before-cpp" might mean that the check is on pp-tokens
with script conflicts checked per-source file, i.e. after translation
phase 3 [lex.phases].
If that's meant, some pp-tokens constribute to conflicts, even though they
might end up as strings due to preprocessor stringizing [cpp.stringize],
where no TR39 restrictions should apply (I thought).
I have no clue what "private" means.
It seems that "after-cpp" means that phase 7 identifiers are subject to
checking, but I don't know how that interacts with reachable identifiers
in modules (which are, by definition, in another translation unit).
> also encouraging headers to go with their own unicode identifiers is the totally wrong way to me. nobody is using them, thanksfully.
> it was a false start from the beginning, and we should not encourage them even more.
The C++ standard has allowed Unicode characters in identifiers ever
since C++98. The restrictions at the edges have shifted a bit, but
if you used non-fringe characters in your native language to write
a C++98 program, that program has not become ill-formed with C++20
just because of your use of native characters in identifiers.
That's part of the backward compatibility story.
> it only leads to balkanization and diversion, as taken literally from the CIA sabotage field manual. only very few developers
I fail to comprehend what sabotage (i.e. intentionally causing disruption)
has to do with a technical discussion on permissible C++ identifier
spellings.
> can identify all the scripts. did you see a mixed-language wikipedia encouraging all languages at once? there's none,
> only the respective islands.
>
> I don't think it's a good use of SG16's time to discuss this
> paper until these concerns are addressed.
>
>
> these were addressed, since you added these concerns.
Ah, thanks for letting us know.
Neither the "Changes" section in https://rurban.github.io/libu8ident/doc/P2528R0.html
nor the announcement of that link contained any hint.
I can't read stuff afresh over and over again; not enough time.
Please make sure to produce meaningful change logs.
Jens
>
> On Tue, Jan 25, 2022 at 7:38 PM Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> On 25/01/2022 17.13, Tom Honermann via SG16 wrote:
> > On 1/25/22 3:13 AM, Corentin Jabot via SG16 wrote:
> > The standard could (I think) also provide normative encouragement to implementors to emit a diagnostic for identifiers that are not inline with TR39 guidance. I'm not sure if we already have examples of encouragement for additional diagnostics elsewhere.
>
> I'm not sure SG16 is the right place to discuss such fundamental matters.
>
> For example, some people like to compile their code with -Werror, and
> thus a recommended warning that they cannot possibly avoid (because e.g.
> it is inevitably caused by a third-party library) is indistinguishable
> from "ill-formed" for them.
>
>
> true. but it's still a security issue, not just a style issue. security concerns should be handled upfront, else they leak in.
> esp. potential insecure third-party libraries.
I think it's still up for debate whether that's an issue that fits
the scope of the C++ standard.
> Back to the paper at hand: Its unit of consideration is the
> "translation unit", which might be formed from header files
> from various third-party sources. In a world permeated by
> Unicode, it seems very reasonable that each third party would
> choose their own script for e.g. identifiers of local
> variables in inline functions, yet that inevitably would
> cause conflicts under Reini's suggestion.
>
>
> nope, I made 3 suggestions for the "context" in which to check, not only the translation unit.
>
> 1. before-cpp (really in-cpp)
> 2. private (lexical contexts)
> 3. after-cpp (all in one)
>
> before-cpp would do the checks in cpp, each header file in its own context, with its own scripts,
> with only the resulting ABI causing potential conflicts, but easiest to check and understand for the user.
>
> after-cpp (i.e. in the compiler lexer, not in the cpp lexer) with the strictest and most secure implementation.
I'm sorry, but I have trouble mapping the "context" descriptions
in https://rurban.github.io/libu8ident/doc/P2528R0.html to concepts
described in the C++ standard.
Maybe you could rephrase this in terms of grammar non-terminals used
in the C++ standard?
For example, "before-cpp" might mean that the check is on pp-tokens
with script conflicts checked per-source file, i.e. after translation
phase 3 [lex.phases].
If that's meant, some pp-tokens constribute to conflicts, even though they
might end up as strings due to preprocessor stringizing [cpp.stringize],
where no TR39 restrictions should apply (I thought).
I have no clue what "private" means.
It seems that "after-cpp" means that phase 7 identifiers are subject to
checking, but I don't know how that interacts with reachable identifiers
in modules (which are, by definition, in another translation unit).
> also encouraging headers to go with their own unicode identifiers is the totally wrong way to me. nobody is using them, thanksfully.
> it was a false start from the beginning, and we should not encourage them even more.
The C++ standard has allowed Unicode characters in identifiers ever
since C++98. The restrictions at the edges have shifted a bit, but
if you used non-fringe characters in your native language to write
a C++98 program, that program has not become ill-formed with C++20
just because of your use of native characters in identifiers.
That's part of the backward compatibility story.
> it only leads to balkanization and diversion, as taken literally from the CIA sabotage field manual. only very few developers
I fail to comprehend what sabotage (i.e. intentionally causing disruption)
has to do with a technical discussion on permissible C++ identifier
spellings.
> can identify all the scripts. did you see a mixed-language wikipedia encouraging all languages at once? there's none,
> only the respective islands.
>
> I don't think it's a good use of SG16's time to discuss this
> paper until these concerns are addressed.
>
>
> these were addressed, since you added these concerns.
Ah, thanks for letting us know.
Neither the "Changes" section in https://rurban.github.io/libu8ident/doc/P2528R0.html
nor the announcement of that link contained any hint.
I can't read stuff afresh over and over again; not enough time.
Please make sure to produce meaningful change logs.
Jens
Received on 2022-01-26 16:13:08