ISOCPP sg16 List: Re: Agenda for the 2022-07-13 SG16 telecon; no official meeting

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Sun, 10 Jul 2022 13:25:56 +0200

On Sun, Jul 10, 2022 at 8:37 AM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 10/07/2022 00.03, Corentin Jabot wrote:
> >
> >
> > On Sat, Jul 9, 2022 at 9:03 PM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:
> Jens.Maurer_at_[hidden]>> wrote:
>
> > Are there places other than identifiers where we can have UCNs
> > outside of char/string literals? If not, maybe we should massage
> > the grammar definition of _identifier_ instead of persisting
> > the handwaving in lex.phases p4.
> >
> >
> > The idea of doing it there, as we form preprocessor tokens, is that we
> don't want to
> > int i\N{SEMICOLON} to do something (I don't think implementers would
> like that).
>
> I don't understand. What does "do something" want to say?
> If we specify _identifier_ to accept any well-formed UCN
> and we then say that an _identifier_ containing a UCN
> for a basic character is ill-formed, that would seem
> to work.
>

I think I got what you are saying.
We could put universal-character-name in the grammar of identifiers:

identifier: <http://eel.is/c++draft/lex.name#nt:identifier>
    *identifier-start* <http://eel.is/c++draft/lex.name#nt:identifier-start>
   *identifier* <http://eel.is/c++draft/lex.name#nt:identifier>
*identifier-continue*
<http://eel.is/c++draft/lex.name#nt:identifier-continue>
identifier-start: <http://eel.is/c++draft/lex.name#nt:identifier-start>
   *nondigit* <http://eel.is/c++draft/lex.name#nt:nondigit>
   an element of the translation character set of class XID_Start
   *universal-character-name*
identifier-continue:
<http://eel.is/c++draft/lex.name#nt:identifier-continue>
  *digit* <http://eel.is/c++draft/lex.name#nt:digit>
  *nondigit* <http://eel.is/c++draft/lex.name#nt:nondigit>
  an element of the translation character set of class XID_Continue
  *universal-character-name*

Because identifiers are maximally munched, this would work, and we could
remove the wording from phase 4.
(We would need some additional wording in [lex.name] of course).
Was that your idea? In which case, I like the direction

>
> > andi\N{SEMICOLON} doesn't match the grammar of an identifier, so I
> think an eager replacement makes sense.
> > We should avoid having to carry these things around through phase 3.
> >
> >
> >
> >
> > > https://isocpp.org/files/papers/P2621R0.pdf <
> https://isocpp.org/files/papers/P2621R0.pdf> <
> https://isocpp.org/files/papers/P2621R0.pdf <
> https://isocpp.org/files/papers/P2621R0.pdf>>
> >
> > Should this go to SG12, because it discusses undefined behavior?
> >
> >
> > I'm hoping that removing UB from a context in which no UB can
> meaningfully exist doesn't require
> > to visit all the groups
>
> Nobody was suggesting "all the groups"; the suggestion was to visit the
> specific group that deals with UB. I expect a rubber stamp, given the
> nature of UB and the implementation consensus.
>
> > > * Some unicode algorithms are unbounded, and may require
> allocation. A small_vector would help specification,
> >
> > Why would it help with the specification?
> >
> >
> > I guess we could say "does not allocate if some variable is smaller
> than X" but implementers will have to have a small vector as an
> implementation detail so if
> > people have the idea that such a thing should be standardized anyway,
> it's something that could be useful.
>
> If something needs allocation for extreme cases, we already have to
> deal with the consequences (e.g. throwing std::bad_alloc) in the
> specification. It feels entirely an implementation-internal
> optimization to avoid the allocation for some cases.
> "Recommended practice" is the tool to guide implementations the
> right way here. I'm still not seeing a need for small_vector
> for the specification.
>
> Jens
>

Received on 2022-07-10 11:26:08