On Sun, Jul 10, 2022 at 8:37 AM Jens Maurer <Jens.Maurer@gmx.net> wrote:
On 10/07/2022 00.03, Corentin Jabot wrote:
>
>
> On Sat, Jul 9, 2022 at 9:03 PM Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>> wrote:

>     Are there places other than identifiers where we can have UCNs
>     outside of char/string literals?  If not, maybe we should massage
>     the grammar definition of _identifier_ instead of persisting
>     the handwaving in lex.phases p4.
>
>
> The idea of doing it there, as we form preprocessor tokens, is that we don't want to
> int i\N{SEMICOLON} to do something (I don't think implementers would like that).

I don't understand. What does "do something" want to say?
If we specify _identifier_ to accept any well-formed UCN
and we then say that an _identifier_ containing a UCN
for a basic character is ill-formed, that would seem
to work.

I think I got what you are saying. 
We could put universal-character-name in the grammar of identifiers:
identifier-start:
   nondigit
   an element of the translation character set of class XID_Start
   universal-character-name
identifier-continue:
  digit
  nondigit
  an element of the translation character set of class XID_Continue
  universal-character-name


Because identifiers are maximally munched, this would work, and we could remove the wording from phase 4.
(We would need some additional wording in [lex.name] of course).
Was that your idea? In which case, I like the direction


 

>   andi\N{SEMICOLON} doesn't match the grammar of an identifier, so I think an eager replacement makes sense.
> We should avoid having to carry these things around through phase 3.
>
>  
>
>
>     > https://isocpp.org/files/papers/P2621R0.pdf <https://isocpp.org/files/papers/P2621R0.pdf> <https://isocpp.org/files/papers/P2621R0.pdf <https://isocpp.org/files/papers/P2621R0.pdf>>
>
>     Should this go to SG12, because it discusses undefined behavior?
>
>
> I'm hoping that removing UB from a context in which no UB can meaningfully exist doesn't require
> to visit all the groups

Nobody was suggesting "all the groups"; the suggestion was to visit the
specific group that deals with UB.  I expect a rubber stamp, given the
nature of UB and the implementation consensus.

>     > * Some unicode algorithms are unbounded, and may require allocation. A small_vector would help specification,
>
>     Why would it help with the specification?
>
>  
> I guess we could say "does not allocate  if some variable is smaller than X" but implementers will have to have a small vector as an implementation detail so if
> people have the idea that such a thing should be standardized anyway, it's something that could be useful. 

If something needs allocation for extreme cases, we already have to
deal with the consequences (e.g. throwing std::bad_alloc) in the
specification.  It feels entirely an implementation-internal
optimization to avoid the allocation for some cases.
"Recommended practice" is the tool to guide implementations the
right way here.  I'm still not seeing a need for small_vector
for the specification.

Jens