ISOCPP sg16 List: Re: Agenda for the 2022-07-13 SG16 telecon; no official meeting

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Mon, 11 Jul 2022 00:38:48 +0200

On Mon, Jul 11, 2022, 00:33 Tom Honermann <tom_at_[hidden]> wrote:

>
> On 7/10/22 11:30 AM, Jens Maurer wrote:
>
> On 10/07/2022 13.25, Corentin Jabot wrote:
>
> On Sun, Jul 10, 2022 at 8:37 AM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]> <Jens.Maurer_at_[hidden]>> wrote:
>
> On 10/07/2022 00.03, Corentin Jabot wrote:
> >
> >
> > On Sat, Jul 9, 2022 at 9:03 PM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]> <Jens.Maurer_at_[hidden]> <mailto:Jens.Maurer_at_[hidden] <Jens.Maurer_at_[hidden]> <mailto:Jens.Maurer_at_[hidden]> <Jens.Maurer_at_[hidden]>>> wrote:
>
> > Are there places other than identifiers where we can have UCNs
> > outside of char/string literals? If not, maybe we should massage
> > the grammar definition of _identifier_ instead of persisting
> > the handwaving in lex.phases p4.
> >
> >
> > The idea of doing it there, as we form preprocessor tokens, is that we don't want to
> > int i\N{SEMICOLON} to do something (I don't think implementers would like that).
>
> I don't understand. What does "do something" want to say?
> If we specify _identifier_ to accept any well-formed UCN
> and we then say that an _identifier_ containing a UCN
> for a basic character is ill-formed, that would seem
> to work.
>
>
> I think I got what you are saying.
> We could put universal-character-name in the grammar of identifiers:
>
> identifier: <http://eel.is/c++draft/lex.name#nt:identifier> <http://eel.is/c++draft/lex.name#nt:identifier>
> /identifier-start/ <http://eel.is/c++draft/lex.name#nt:identifier-start> <http://eel.is/c++draft/lex.name#nt:identifier-start>
> /identifier/ <http://eel.is/c++draft/lex.name#nt:identifier> <http://eel.is/c++draft/lex.name#nt:identifier> /identifier-continue/ <http://eel.is/c++draft/lex.name#nt:identifier-continue> <http://eel.is/c++draft/lex.name#nt:identifier-continue>
> identifier-start: <http://eel.is/c++draft/lex.name#nt:identifier-start> <http://eel.is/c++draft/lex.name#nt:identifier-start>
> /nondigit/ <http://eel.is/c++draft/lex.name#nt:nondigit> <http://eel.is/c++draft/lex.name#nt:nondigit>
> an element of the translation character set of class XID_Start
> /_universal-character-name_/
> identifier-continue: <http://eel.is/c++draft/lex.name#nt:identifier-continue> <http://eel.is/c++draft/lex.name#nt:identifier-continue>
> /digit/ <http://eel.is/c++draft/lex.name#nt:digit> <http://eel.is/c++draft/lex.name#nt:digit>
> /nondigit/ <http://eel.is/c++draft/lex.name#nt:nondigit> <http://eel.is/c++draft/lex.name#nt:nondigit>
> an element of the translation character set of class XID_Continue
> /_universal-character-name_/
>
>
> Because identifiers are maximally munched, this would work, and we could remove the wording from phase 4.
> (We would need some additional wording in [lex.name <http://lex.name> <http://lex.name>] of course).
> Was that your idea?
>
> Yes, something like that. We'd need to say that the UCNs
> are replaced by translation characters and then must still
> satisfy the _identifier_ production (i.e. XID_Start, XID_Continue).
>
> Can we express C++-meaningful whitespace using a UCN?
>
> Not at the moment
>
> Wasn't there an idea somewhere that we maybe want a
> double-width space as regular C++ whitespace?
>
> Yes. We have several related SG16 issues. Issue #69 specifically discusses
> ideographic space.
>
> - Issue #69: Specify what constitutes white-space characters
> <https://github.com/sg16-unicode/sg16/issues/69>
> - Issue #70: Specify what constitutes a new-line
> <https://github.com/sg16-unicode/sg16/issues/70>
> - Issue #74: Extend whitespace to include NEL, LS, PS, LRM, RLM, and
> maybe ALM <https://github.com/sg16-unicode/sg16/issues/74>
>
> We have previously discussed defining whitespace based on Unicode
> properties in which case there are two to choose from:
>
> - The Pattern_White_Space
> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:Pattern_White_Space=Yes:]>
> property specifies a limited set of whitespace characters; doing what issue
> #74 suggests would align C++ with this set.
> - The White_Space
> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:White_Space=Yes:]>
> property specifies more characters (including ideographic space) but does
> not include the LTR and RTL marks that Pattern_White_Space does.
>
> A couple of things.
1/if we want to compile files with more white spaces we need a strong
rationale and a good understanding on the implementation cost. I don't
think the rationale exists.

2/if there is a use case for expressing whitespaces as ucns, i don't see it.

> -
>
> Tom.
>
> In which case, I like the direction
>
> Jens
>
>

Received on 2022-07-10 22:38:59