C++ Logo

sg16

Advanced search

Re: libu8ident 0.1 released

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 26 Jan 2022 21:37:16 +0100
On Wed, Jan 26, 2022 at 5:13 PM Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

> On 26/01/2022 07.51, Reini Urban wrote:
> >
> > On Tue, Jan 25, 2022 at 7:38 PM Jens Maurer via SG16 <
> sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> >
> > On 25/01/2022 17.13, Tom Honermann via SG16 wrote:
> > > On 1/25/22 3:13 AM, Corentin Jabot via SG16 wrote:
> > > The standard could (I think) also provide normative encouragement
> to implementors to emit a diagnostic for identifiers that are not inline
> with TR39 guidance. I'm not sure if we already have examples of
> encouragement for additional diagnostics elsewhere.
> >
> > I'm not sure SG16 is the right place to discuss such fundamental
> matters.
> >
> > For example, some people like to compile their code with -Werror, and
> > thus a recommended warning that they cannot possibly avoid (because
> e.g.
> > it is inevitably caused by a third-party library) is
> indistinguishable
> > from "ill-formed" for them.
> >
> >
> > true. but it's still a security issue, not just a style issue. security
> concerns should be handled upfront, else they leak in.
> > esp. potential insecure third-party libraries.
>
> I think it's still up for debate whether that's an issue that fits
> the scope of the C++ standard.
>
> > Back to the paper at hand: Its unit of consideration is the
> > "translation unit", which might be formed from header files
> > from various third-party sources. In a world permeated by
> > Unicode, it seems very reasonable that each third party would
> > choose their own script for e.g. identifiers of local
> > variables in inline functions, yet that inevitably would
> > cause conflicts under Reini's suggestion.
> >
> >
> > nope, I made 3 suggestions for the "context" in which to check, not only
> the translation unit.
> >
> > 1. before-cpp (really in-cpp)
> > 2. private (lexical contexts)
> > 3. after-cpp (all in one)
> >
> > before-cpp would do the checks in cpp, each header file in its own
> context, with its own scripts,
> > with only the resulting ABI causing potential conflicts, but easiest to
> check and understand for the user.
> >
> > after-cpp (i.e. in the compiler lexer, not in the cpp lexer) with the
> strictest and most secure implementation.
>
> I'm sorry, but I have trouble mapping the "context" descriptions
> in https://rurban.github.io/libu8ident/doc/P2528R0.html to concepts
> described in the C++ standard.
>
> Maybe you could rephrase this in terms of grammar non-terminals used
> in the C++ standard?
>

I think the paper offers sufficient material, design consideration and
direction
for us to talk about the paper (and decide on a general direction), before
asking the author (who is new to the committee afaik)
to improve the wording.


>
> For example, "before-cpp" might mean that the check is on pp-tokens
> with script conflicts checked per-source file, i.e. after translation
> phase 3 [lex.phases].
>
> If that's meant, some pp-tokens constribute to conflicts, even though they
> might end up as strings due to preprocessor stringizing [cpp.stringize],
> where no TR39 restrictions should apply (I thought).
>
> I have no clue what "private" means.
>
> It seems that "after-cpp" means that phase 7 identifiers are subject to
> checking, but I don't know how that interacts with reachable identifiers
> in modules (which are, by definition, in another translation unit).
>
> > also encouraging headers to go with their own unicode identifiers is the
> totally wrong way to me. nobody is using them, thanksfully.
> > it was a false start from the beginning, and we should not encourage
> them even more.
>
> The C++ standard has allowed Unicode characters in identifiers ever
> since C++98. The restrictions at the edges have shifted a bit, but
> if you used non-fringe characters in your native language to write
> a C++98 program, that program has not become ill-formed with C++20
> just because of your use of native characters in identifiers.
> That's part of the backward compatibility story.
>
> > it only leads to balkanization and diversion, as taken literally from
> the CIA sabotage field manual. only very few developers
>
> I fail to comprehend what sabotage (i.e. intentionally causing disruption)
> has to do with a technical discussion on permissible C++ identifier
> spellings.
>
> > can identify all the scripts. did you see a mixed-language wikipedia
> encouraging all languages at once? there's none,
> > only the respective islands.
>
> >
> > I don't think it's a good use of SG16's time to discuss this
> > paper until these concerns are addressed.
> >
> >
> > these were addressed, since you added these concerns.
>
> Ah, thanks for letting us know.
>
> Neither the "Changes" section in
> https://rurban.github.io/libu8ident/doc/P2528R0.html
> nor the announcement of that link contained any hint.
>
> I can't read stuff afresh over and over again; not enough time.
> Please make sure to produce meaningful change logs.
>
> Jens
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2022-01-26 20:37:27