ISOCPP sg16 List: Re: libu8ident 0.1 released

From: Reini Urban <reini.urban_at_[hidden]>
Date: Wed, 26 Jan 2022 07:51:26 +0100

On Tue, Jan 25, 2022 at 7:38 PM Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

> On 25/01/2022 17.13, Tom Honermann via SG16 wrote:
> > On 1/25/22 3:13 AM, Corentin Jabot via SG16 wrote:
> > The standard could (I think) also provide normative encouragement to
> implementors to emit a diagnostic for identifiers that are not inline with
> TR39 guidance. I'm not sure if we already have examples of encouragement
> for additional diagnostics elsewhere.
>
> I'm not sure SG16 is the right place to discuss such fundamental matters.
>
> For example, some people like to compile their code with -Werror, and
> thus a recommended warning that they cannot possibly avoid (because e.g.
> it is inevitably caused by a third-party library) is indistinguishable
> from "ill-formed" for them.
>

true. but it's still a security issue, not just a style issue. security
concerns should be handled upfront, else they leak in.
esp. potential insecure third-party libraries.

> Back to the paper at hand: Its unit of consideration is the
> "translation unit", which might be formed from header files
> from various third-party sources. In a world permeated by
> Unicode, it seems very reasonable that each third party would
> choose their own script for e.g. identifiers of local
> variables in inline functions, yet that inevitably would
> cause conflicts under Reini's suggestion.
>

nope, I made 3 suggestions for the "context" in which to check, not only
the translation unit.

1. before-cpp (really in-cpp)
2. private (lexical contexts)
3. after-cpp (all in one)

before-cpp would do the checks in cpp, each header file in its own context,
with its own scripts,
with only the resulting ABI causing potential conflicts, but easiest to
check and understand for the user.

after-cpp (i.e. in the compiler lexer, not in the cpp lexer) with the
strictest and most secure implementation.

also encouraging headers to go with their own unicode identifiers is the
totally wrong way to me. nobody is using them, thanksfully.
it was a false start from the beginning, and we should not encourage them
even more.
it only leads to balkanization and diversion, as taken literally from the
CIA sabotage field manual. only very few developers
can identify all the scripts. did you see a mixed-language wikipedia
encouraging all languages at once? there's none,
only the respective islands.

> I don't think it's a good use of SG16's time to discuss this
> paper until these concerns are addressed.
>

these were addressed, since you added these concerns.

Reini

Received on 2022-01-26 06:51:38