C++ Logo


Advanced search

Re: Agenda for the 2022-05-25 SG16 telecon

From: Steve Downey <sdowney_at_[hidden]>
Date: Sat, 21 May 2022 17:40:31 -0400
I think the impedance mismatch is that although lex and parse are pattern
languages, for C++ they are not *Unicode* pattern languages. Except for
identifiers we restrict the allowed characters to a subset of the modern
portable characters.

Given this direction from the Unicode Consortium, we might want to update
our conformance note to reflect that our lexing rules are not based in 31,
without changing any normative wording.

I haven't thought or read deeply about the whitespace changes. The
interplay between this and 'trojan source' is interesting. However, I'm
also confident that if we don't do anything now, we can fix it later, even
if that means technically breaking conforming code. Your deliberately
hostile and confusing security attack is now ill-formed is, I think, a
reasonable sell.

On Sat, May 21, 2022, 13:11 Hubert Tong via SG16 <sg16_at_[hidden]>

> TL;DR: The motivation for the change is focused on guidance around
> whitespace; the original intent of the requirement, and indeed the
> whole document, is around stability of the interpretation of
> identifiers and pattern-based syntax. The specific guidance about
> whitespace that is being added does not fall within the scope
> described by the summary of the document. The applicability of the
> formulation with Pattern_Syntax to non-pattern-based languages is not
> established. The assumed nature of "whitespace" is also not clear.
> On Fri, May 20, 2022 at 11:52 PM Hubert Tong
> <hubert.reinterpretcast_at_[hidden]> wrote:
> >
> > Note: The characters not in the basic character set and not part of an
> > identifier won't need to be Pattern_Syntax under the profile. We error
> > on those outside of string/character literals.
> Hmm. I am not sure for such characters what having them as
> Pattern_Syntax or not implies.
> I'm pretty sure we don't need them to be Pattern_Syntax in the
> profile, but having the ones that are Pattern_Syntax in the UCD remain
> Pattern_Syntax is probably right too.
> We treat them all the same anyway, and we already retain the ability
> to use them for whatever purpose outside of literals in the future
> without breaking currently-conforming programs.
> It's not like we automatically accept the set of characters neither in
> Pattern_Syntax and Pattern_White_Space (unlike the assumed behaviour
> for pattern languages).
> We already succeed in the original intent of the requirement insofar
> as character sets are concerned: Currently-conforming programs would
> not change meaning because of future choices to use new characters for
> syntactic purposes outside of literals.
> I do not believe the intent of the new change (to apply some
> whitespace-related guidance to general programming languages) falls
> within the intent of the original requirement (to keep valid patterns
> stable). It may be more appropriate to formulate a new, separate
> requirement to implement the intent of the new change (which is mostly
> restricted to Pattern_White_Space concerns and does not need to bring
> Pattern_Syntax into it).
> The document should also make its assumptions about "whitespace"
> clear: Is all whitespace expected to be treated equivalently, or is
> that specifically not assumed?
> Finally, the update to the document does not make appropriate updates
> to reflect expansions to its scope (at the document level in its title
> and summary, and in the heading given for the "Pattern Syntax"
> section).
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2022-05-21 21:40:45