C++ Logo


Advanced search

Re: libu8ident 0.1 released

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 25 Jan 2022 11:06:19 -0500
On 1/25/22 1:38 AM, Reini Urban via SG16 wrote:
> Tom Honermann <tom_at_[hidden] <mailto:tom_at_[hidden]>> schrieb
> am Di., 25. Jan. 2022, 07:03:
> On 1/24/22 11:43 AM, Reini Urban via SG16 wrote:
>> I've just released libu8ident, which implements the TR39 checks
>> (and some TR31 profiles) for unicode identifiers. UTF-8 only.
>> wchar support would be trivial, but I don't think anybody (but
>> MSVC) would need that.
>> That's the only such tool which is not in Rust or Java. I've only
>> implemented it previously in my perl5 fork in 2016.
>> This accompanies my recent WG21 and WG14 papers P2528R0 and n2916
>> https://rurban.github.io/libu8ident/doc/P2528R0.html
>> <https://rurban.github.io/libu8ident/doc/P2528R0.html>
>> https://rurban.github.io/libu8ident/doc/n2916.html
>> <https://rurban.github.io/libu8ident/doc/n2916.html>
> Thank you, Reini. I will get these scheduled for review in SG16.
> Please note that we are now beyond the deadline for new papers for
> C23 and C++23, so review will be directed towards later standards.
> Our immediate priority is to finalize features that have been
> accepted for C++23. As a result, it may be a few months before
> these papers get scheduled in SG16. Though an argument could be
> made that your proposal constitutes modification of a feature
> accepted for C++23 (P1949) and therefore in scope for that
> standard, I see your proposal as more of a competing one rather
> than a modification. P1949 effectively brought the standard up to
> date with more recent Unicode versions without changing the design
> intent; the changes you propose are a change in direction and more
> disruptive.
> In my point of view is that C11 made identifiers insecure by making
> them non-identifiable, and adopting TR39 will fix that spec bug. So a
> bugfix, not a feature.
That is ok. If the committee accepts the proposal and agrees to
categorize it as a bug fix, then it can be adopted as a Defect Report
(DR) with the intent that implementors apply it to previous standards.
> That is ok, but I we'll need more time than we currently have
> available to understand the impact. I think implementation
> experience in a C++ compiler may be needed to really understand
> the effect on existing code bases.
> In that regard I'd already asked Fedora admins to do a scan of the
> Linux packages. Will ask GitHub also, but this would then need a CVE,
> and this might draw too much unneeded attention. The trojansource guy
> did it this way.

Scanning packages for identifiers that would be restricted under your
proposal seems useful, but may not be sufficient. If I understand the
proposal correctly (and there is a reasonable chance that I do not; I've
only quickly scanned your papers so far), the script restrictions apply
to translation units, not to files. This could cause issues for project
composition due to the inclusion of header files or import of modules
that have identifiers written in competing scripts. Thus, actually
building such software as opposed to just scanning the files may be
required to identify such cases. Please correct me if I have
misunderstood the proposal.

> Note that the paper linked as P2528R0 above identifies itself as
> P2538R0 in the document header.
> I'm not able to find a P2528R0 or a P2538R0 in the WG21 paper
> archive. Has the paper been submitted to WG21?
> Yes, but Hal had not posted it yet. I froze it though.
> The WG14 variant is also accepted and frozen, but not posted yet.
Thank you. I did manage to find P2528R0 in the WG21 repository
<https://wg21.link/p2528r0> and N2916 in the WG14 one
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2916.htm> after all.
> A bit of process information in case you aren't aware: draft
> revisions of papers should be identified as, for example,
> D2528R<N> and only marked as, for example, P2528R<N> once
> submitted. Once a paper has been distributed with a "P"
> designation, it should not be modified again (this is to ensure
> that everyone viewing a "P" paper sees the same content).
> Yes, that has a good description on the webpages.

Excellent :)


>> I've also came up with 2 TR31 bug reports, sent to unicode via
>> their contact form.
>> They might fix their tables for version 15 then. See doc/tr31-bugs/md
> The link for that document appears to be
> https://rurban.github.io/libu8ident/doc/tr31-bugs.md
> <https://rurban.github.io/libu8ident/doc/tr31-bugs.md>.
> Tom.
>> --
>> Reini Urban

Received on 2022-01-25 16:06:20