Sorry, one more question: are you making a parallel WG14 proposal?
Peter
From: SG16 <sg16-bounces@lists.isocpp.org>
On Behalf Of Corentin via SG16
Sent: 28 May 2020 09:04
To: SG16 <sg16@lists.isocpp.org>
Cc: Corentin <corentin.jabot@gmail.com>
Subject: [SG16] Redefining Lexing in terms of Unicode
EXTERNAL MAIL
Hello.
Following some Twitter discussions with Tom, Alisdair and Steve, I would like to propose that lexing should be redefined in terms of Unicode.
This would be mostly a wording change with limited effect on implementations and existing code.
Current state of affair:
Any character not in the basic source character set is converted to a universal character name \uxxxx, whose values map 1-1 to unicode code points
The execution character set is defined in terms of the basic source character set
\u and \U sequences can appear in identifiers and strings
\u and \U sequences are reverted in raw string literals.
Proposed, broad strokes
The intent with these changes is to limit modifications to the behavior or implementation of existing implementations, there is however a breaking behavior change
Identifiers which contain \u or \U escape sequences would become ill-formed since with these new rules \u and \U can only appear in string and characters literals.
I suggest that either
- We make such identifier ill-formed
- We make such identifier deprecated.
The reason is that this feature is not well-motivated (such identifier exists for compatibility between files of different encoding which cannot represent the same characters but in practice
can only be used on extern identifiers and identifiers declared in modules or imported headers as most implementations do not provide a per-header encoding selection mechanism), and
is hard to use properly (such identifiers are indeed hardly readable)
The same result could be achieve with a reification operator such as proposed by P1240, ie: [: "foo\u0300" :] = 42;
The hope is that these changes would make it less confusing for any one involve how lexing is perform.
I do expect this effort to be rather involved, (and I am terrible at wording).
What do you think?
Any one willing to help over the next couple of years?
Cheers,
Corentin