C++ Logo


Advanced search

Re: libu8ident 0.1 released

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 25 Jan 2022 11:13:01 -0500
On 1/25/22 3:13 AM, Corentin Jabot via SG16 wrote:
> On Tue, Jan 25, 2022, 08:20 Ville Voutilainen via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> On Tue, 25 Jan 2022 at 08:38, Reini Urban via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> >> Thank you, Reini. I will get these scheduled for review in
> SG16. Please note that we are now beyond the deadline for new
> papers for C23 and C++23, so review will be directed towards later
> standards. Our immediate priority is to finalize features that
> have been accepted for C++23. As a result, it may be a few months
> before these papers get scheduled in SG16. Though an argument
> could be made that your proposal constitutes modification of a
> feature accepted for C++23 (P1949) and therefore in scope for that
> standard, I see your proposal as more of a competing one rather
> than a modification. P1949 effectively brought the standard up to
> date with more recent Unicode versions without changing the design
> intent; the changes you propose are a change in direction and more
> disruptive.
> >
> >
> > In my point of view is that C11 made identifiers insecure by
> making them non-identifiable, and adopting TR39 will fix that spec
> bug. So a bugfix, not a feature.
> At any rate, going into C++23 with P1949 and then looking at this
> proposal later would mean that we have a breaking change
> in our hands. If we go with this proposal, we can relax the
> identifiers if we decide (again, tho, I might add) that we're not
> concerned (*) about
> bidi/homoglyph attacks.
> (*) Or, rather, that we leave that to Quality of Implementation and
> external tools, I guess.
> But still, the high-order bit, to me, seems to be that if we go with
> P1949 for C++23, the horses are out of the barn and the cats are out
> of the bag. Taking a step back and going with what's proposed here
> after C++23 has been published seems.. ..awkward. I would
> find the hypothetical reverse order much less awkward, or even just
> sticking with what's being proposed here.
> I must wonder, tho, whether the cats are already out of the bag,
> considering that gcc and clang already allow all sorts of identifiers
> and gcc has that bidi warning. Seems like we're damned if we do,
> damned if we don't.
> There is no way that we can accommodate both stability and safety
> simultaneously for any Unicode related things, nor can we react fast
> enough to security concerns. In fact there have been a lot of
> developments in the last few weeks alone.
> Now, these security concerns are both important, and to handle with
> care as... It's easy to preclude legitimate use cases.
> We need as an industry to:
> * Understand where the diagnostic needs to be produced (ide,
> compilers, code review tools...)
> * Find the right way to handle security without excluding legitimate
> use cases, and a lot of that will come from the Unicode consortium
> * Have some flexibility to adapt to new best practices
> I don't think the C++ standard is the place to try to solve these
> matters, but we could (should?) offer some recommendations.

I think this is a point worth further discussion. For example,
addressing these matters via MISRA and other such compliance standards
may be preferred; perhaps as a starting point for eventual standardization.

The standard could (I think) also provide normative encouragement to
implementors to emit a diagnostic for identifiers that are not inline
with TR39 guidance. I'm not sure if we already have examples of
encouragement for additional diagnostics elsewhere.


> But long cycles and a desire to provide more stability than Unicode
> guarantees, as well as limited bandwidth, makes me think this is
> ultimately a QoI concern.
> Anyway, P1949 is a breaking change, this is another breaking change.
> P1949 was designed to guarantee stability over time but security
> consideration on top won't. So maybe which cycle this gets in doesn't
> matter terribly, except that we should either consider security
> matters swiftly, or recognize that we are not the right body to deal
> with that.
> I mentioned somewhere else that the rust compiler does confusable
> detection yet the spec only mentions UAX.
> For me the question is: does the sudden interest for Unicode related
> security matters invalidates p1949's initial analysis that confusable
> detection is better left in the hands of implementations?
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>

Received on 2022-01-25 16:13:03