On Sun, Jul 10, 2022 at 8:37 AM Jens Maurer <Jens.Maurer@gmx.net> wrote:

On 10/07/2022 00.03, Corentin Jabot wrote:
>
>
> On Sat, Jul 9, 2022 at 9:03 PM Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>> wrote:

> Are there places other than identifiers where we can have UCNs
> outside of char/string literals? If not, maybe we should massage
> the grammar definition of _identifier_ instead of persisting
> the handwaving in lex.phases p4.
>
>
> The idea of doing it there, as we form preprocessor tokens, is that we don't want to
> int i\N{SEMICOLON} to do something (I don't think implementers would like that).

I don't understand. What does "do something" want to say?
If we specify _identifier_ to accept any well-formed UCN
and we then say that an _identifier_ containing a UCN
for a basic character is ill-formed, that would seem
to work.

I think I got what you are saying.

We could put universal-character-name in the grammar of identifiers:

identifier:
identifier-start
identifier identifier-continue

identifier-start:
nondigit
an element of the translation character set of class XID_Start

universal-character-name

identifier-continue:
digit
nondigit
an element of the translation character set of class XID_Continue

universal-character-name

Because identifiers are maximally munched, this would work, and we could remove the wording from phase 4.

(We would need some additional wording in [lex.name] of course).

Was that your idea? In which case, I like the direction

> andi\N{SEMICOLON} doesn't match the grammar of an identifier, so I think an eager replacement makes sense.
> We should avoid having to carry these things around through phase 3.
>
>
>
>
> > https://isocpp.org/files/papers/P2621R0.pdf <https://isocpp.org/files/papers/P2621R0.pdf> <https://isocpp.org/files/papers/P2621R0.pdf <https://isocpp.org/files/papers/P2621R0.pdf>>
>
> Should this go to SG12, because it discusses undefined behavior?
>
>
> I'm hoping that removing UB from a context in which no UB can meaningfully exist doesn't require
> to visit all the groups

Nobody was suggesting "all the groups"; the suggestion was to visit the
specific group that deals with UB. I expect a rubber stamp, given the
nature of UB and the implementation consensus.

> > * Some unicode algorithms are unbounded, and may require allocation. A small_vector would help specification,
>
> Why would it help with the specification?
>
>
> I guess we could say "does not allocate if some variable is smaller than X" but implementers will have to have a small vector as an implementation detail so if
> people have the idea that such a thing should be standardized anyway, it's something that could be useful.

If something needs allocation for extreme cases, we already have to
deal with the consequences (e.g. throwing std::bad_alloc) in the
specification. It feels entirely an implementation-internal
optimization to avoid the allocation for some cases.
"Recommended practice" is the tool to guide implementations the
right way here. I'm still not seeing a need for small_vector
for the specification.

Jens