On 11/02/2022 01.13, Hubert Tong via SG16 wrote:
> Link to paper: https://thephd.dev/_vendor/future_cxx/papers/d1967.html <https://thephd.dev/_vendor/future_cxx/papers/d1967.html>
> …
> The wording does not address the case of UTF-8 code units in the value of the u8 string literal with numerical values that exceed the range of values representable by signed char.
This might be a pre-existing defect, but we don't address that case for a regular
string literal (on a platform where char happens to be unsigned), either.
I think the time-honoured tradition of having people address at least some defects while "in the area" is reasonable to apply here.
I think what matches existing practice is to use the rules from
[conv.integral]
(
http://eel.is/c++draft/conv.integral#3) and do the normal 2^N modulo arithmetic conversion. The wording might read like this, potentially (note that I have very little idea if I'm doing this correctl):
Additionally, an array of ordinary character type may be initialized by a
UTF-8 string literal, or by such a string literal enclosed in braces.
Successive characters of the value of the string-literal initialize the
elements of the array <INS-NEW>and perform integral conversion [conv.integral]
if necessary</INS-NEW>.
Is that alright?
Sincerely,
JeanHeyd