C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] Confirming my understanding of Unicode source files

From: Alisdair Meredith <alisdairm_at_[hidden]>
Date: Sun, 30 Jun 2024 11:56:45 -0400
Thanks.

For my perspective, my “implementation defined mapping” is “I interpret the
exact set of code units you supply to me as UTF-8-encoded, and will reject
files that are not valid UTF-8.”

AlisdairM


> On 30 Jun 2024, at 11:52, Steve Downey <sdowney_at_[hidden]> wrote:
>
> The translation from a source file, unlike in C, is implementation defined and happening in phase 0. While we did manage to say you are supposed to accept a UTF-8 file, I don't think we said you must reject an I'll formed one.
> That said, splicing, like other preprocessor things can't form new tokens out of the tokens it's working with, so I'm pretty sure that's ill formed outside a string literal. Inside a string literal, an initial combining sequence would combine with the trailing bit of whatever its pasted together with, but there would also need to be intervening quotes.
>
> foo\
> \u{0308} =
>
> Is not foö =
>
> For the same reasons that
> fo\
> o =
>
> Is not foo =
>
> And hopefully my phone has not "helped" me by breaking that all somehow.
>
>
> On Sun, Jun 30, 2024, 11:37 Alisdair Meredith via SG16 <sg16_at_[hidden]> wrote:
> If I have an implementation that accepts only valid UTF-8 encoded
> source files, is the following correct:
>
> If I have a line-splice in the middle of the encoding of a UTF-8 code
> point, then I have a badly encoded source file that should be rejected.
>
> If I have a line-splice in the middle of an identifier separating a combining
> character (such as an accent) from the character it combines with, that
> should be valid as the combining character is expected to modify the next
> element in the source file, not the line-splice character, as the line-splice
> token is direction to the translator on how to proceed to that next element.
>
> AlisdairM
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2024-06-30 15:57:04