Not at the momentOn 10/07/2022 13.25, Corentin Jabot wrote:On Sun, Jul 10, 2022 at 8:37 AM Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>> wrote: On 10/07/2022 00.03, Corentin Jabot wrote: > > > On Sat, Jul 9, 2022 at 9:03 PM Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net> <mailto:Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>>> wrote: > Are there places other than identifiers where we can have UCNs > outside of char/string literals? If not, maybe we should massage > the grammar definition of _identifier_ instead of persisting > the handwaving in lex.phases p4. > > > The idea of doing it there, as we form preprocessor tokens, is that we don't want to > int i\N{SEMICOLON} to do something (I don't think implementers would like that). I don't understand. What does "do something" want to say? If we specify _identifier_ to accept any well-formed UCN and we then say that an _identifier_ containing a UCN for a basic character is ill-formed, that would seem to work. I think I got what you are saying. We could put universal-character-name in the grammar of identifiers: identifier: <http://eel.is/c++draft/lex.name#nt:identifier> /identifier-start/ <http://eel.is/c++draft/lex.name#nt:identifier-start> /identifier/ <http://eel.is/c++draft/lex.name#nt:identifier> /identifier-continue/ <http://eel.is/c++draft/lex.name#nt:identifier-continue> identifier-start: <http://eel.is/c++draft/lex.name#nt:identifier-start> /nondigit/ <http://eel.is/c++draft/lex.name#nt:nondigit> an element of the translation character set of class XID_Start /_universal-character-name_/ identifier-continue: <http://eel.is/c++draft/lex.name#nt:identifier-continue> /digit/ <http://eel.is/c++draft/lex.name#nt:digit> /nondigit/ <http://eel.is/c++draft/lex.name#nt:nondigit> an element of the translation character set of class XID_Continue /_universal-character-name_/ Because identifiers are maximally munched, this would work, and we could remove the wording from phase 4. (We would need some additional wording in [lex.name <http://lex.name>] of course). Was that your idea?Yes, something like that. We'd need to say that the UCNs are replaced by translation characters and then must still satisfy the _identifier_ production (i.e. XID_Start, XID_Continue). Can we express C++-meaningful whitespace using a UCN?
Wasn't there an idea somewhere that we maybe want a double-width space as regular C++ whitespace?
Yes. We have several related SG16 issues. Issue #69 specifically
discusses ideographic space.
We have previously discussed defining whitespace based on Unicode properties in which case there are two to choose from:
Tom.
In which case, I like the directionJens