Dear Jens and Hubert,

On Thu, Feb 10, 2022 at 7:35 PM Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:

On Thu, Feb 10, 2022 at 7:28 PM Jens Maurer <Jens.Maurer@gmx.net> wrote:
On 11/02/2022 01.13, Hubert Tong via SG16 wrote:
> Link to paper: https://thephd.dev/_vendor/future_cxx/papers/d1967.html <https://thephd.dev/_vendor/future_cxx/papers/d1967.html>
> …
> The wording does not address the case of UTF-8 code units in the value of the u8 string literal with numerical values that exceed the range of values representable by signed char.

This might be a pre-existing defect, but we don't address that case for a regular
string literal (on a platform where char happens to be unsigned), either.

I think the time-honoured tradition of having people address at least some defects while "in the area" is reasonable to apply here.

I think what matches existing practice is to use the rules from [conv.integral] (http://eel.is/c++draft/conv.integral#3) and do the normal 2^N modulo arithmetic conversion. The wording might read like this, potentially (note that I have very little idea if I'm doing this correctl):

Additionally, an array of ordinary character type may be initialized by a UTF-8 string literal, or by such a string literal enclosed in braces. Successive characters of the value of the string-literal initialize the elements of the array <INS-NEW>and perform integral conversion [conv.integral] if necessary</INS-NEW>.

Is that alright?

Sincerely,

JeanHeyd