C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] Confirming my understanding of Unicode source files

From: Steve Downey <sdowney_at_[hidden]>
Date: Sun, 30 Jun 2024 11:52:57 -0400
The translation from a source file, unlike in C, is implementation defined
and happening in phase 0. While we did manage to say you are supposed to
accept a UTF-8 file, I don't think we said you must reject an I'll formed
one.
That said, splicing, like other preprocessor things can't form new tokens
out of the tokens it's working with, so I'm pretty sure that's ill formed
outside a string literal. Inside a string literal, an initial combining
sequence would combine with the trailing bit of whatever its pasted
together with, but there would also need to be intervening quotes.

foo\
\u{0308} =

Is not foƶ =

For the same reasons that
fo\
o =

Is not foo =

And hopefully my phone has not "helped" me by breaking that all somehow.


On Sun, Jun 30, 2024, 11:37 Alisdair Meredith via SG16 <
sg16_at_[hidden]> wrote:

> If I have an implementation that accepts only valid UTF-8 encoded
> source files, is the following correct:
>
> If I have a line-splice in the middle of the encoding of a UTF-8 code
> point, then I have a badly encoded source file that should be rejected.
>
> If I have a line-splice in the middle of an identifier separating a
> combining
> character (such as an accent) from the character it combines with, that
> should be valid as the combining character is expected to modify the next
> element in the source file, not the line-splice character, as the
> line-splice
> token is direction to the translator on how to proceed to that next
> element.
>
> AlisdairM
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2024-06-30 15:53:09