On Mon, Jul 11, 2022, 00:33 Tom Honermann <tom@honermann.net> wrote:

On 7/10/22 11:30 AM, Jens Maurer wrote:
On 10/07/2022 13.25, Corentin Jabot wrote:
On Sun, Jul 10, 2022 at 8:37 AM Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>> wrote:

    On 10/07/2022 00.03, Corentin Jabot wrote:
    > On Sat, Jul 9, 2022 at 9:03 PM Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net> <mailto:Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>>> wrote:

    >     Are there places other than identifiers where we can have UCNs
    >     outside of char/string literals?  If not, maybe we should massage
    >     the grammar definition of _identifier_ instead of persisting
    >     the handwaving in lex.phases p4.
    > The idea of doing it there, as we form preprocessor tokens, is that we don't want to
    > int i\N{SEMICOLON} to do something (I don't think implementers would like that).

    I don't understand. What does "do something" want to say?
    If we specify _identifier_ to accept any well-formed UCN
    and we then say that an _identifier_ containing a UCN
    for a basic character is ill-formed, that would seem
    to work.

I think I got what you are saying. 
We could put universal-character-name in the grammar of identifiers:

identifier: <http://eel.is/c++draft/lex.name#nt:identifier>
    /identifier-start/ <http://eel.is/c++draft/lex.name#nt:identifier-start>
   /identifier/ <http://eel.is/c++draft/lex.name#nt:identifier> /identifier-continue/ <http://eel.is/c++draft/lex.name#nt:identifier-continue>
identifier-start: <http://eel.is/c++draft/lex.name#nt:identifier-start>
   /nondigit/ <http://eel.is/c++draft/lex.name#nt:nondigit>
   an element of the translation character set of class XID_Start
identifier-continue: <http://eel.is/c++draft/lex.name#nt:identifier-continue>
  /digit/ <http://eel.is/c++draft/lex.name#nt:digit>
  /nondigit/ <http://eel.is/c++draft/lex.name#nt:nondigit>
  an element of the translation character set of class XID_Continue

Because identifiers are maximally munched, this would work, and we could remove the wording from phase 4.
(We would need some additional wording in [lex.name <http://lex.name>] of course).
Was that your idea?
Yes, something like that.  We'd need to say that the UCNs
are replaced by translation characters and then must still
satisfy the _identifier_ production (i.e. XID_Start, XID_Continue).

Can we express C++-meaningful whitespace using a UCN?
Not at the moment
Wasn't there an idea somewhere that we maybe want a
double-width space as regular C++ whitespace?

Yes. We have several related SG16 issues. Issue #69 specifically discusses ideographic space.

We have previously discussed defining whitespace based on Unicode properties in which case there are two to choose from:

  • The Pattern_White_Space property specifies a limited set of whitespace characters; doing what issue #74 suggests would align C++ with this set.
  • The White_Space property specifies more characters (including ideographic space) but does not include the LTR and RTL marks that Pattern_White_Space does.
A couple of things.
1/if we want to compile files with more white spaces we need a strong rationale and a good understanding on the implementation cost. I don't think the rationale exists.

2/if there is a use case for expressing whitespaces as ucns, i don't see it.


 In which case, I like the direction