sg16: Re: [SG16] Agenda for the 2021-10-06 SG16 telecon

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 1 Oct 2021 22:17:03 +0200

On 01/10/2021 19.40, Tom Honermann via SG16 wrote:
> * How is the IANA registry intended to be applied? Which IANA encoding would be considered a match for each of the following cases?

My guess is we're specifically discussing the return value of the wide_literal()
function in the proposal.

None of the three cases below is describing a conforming implementation of (core language) C++
to start with, so these questions leave me confused as to their applicability to standardizing
something like P1885.

Assuming the core language restrictions are lifted (and the specification
interactions with C and the wide-character functions from C analyzed):

> o Wide literal encoding is UTF-16, sizeof(wchar_t) is 2, CHAR_BIT is >= 8, little endian architecture.

UTF16

> o Wide literal encoding is UTF-16, sizeof(wchar_t) is 1, CHAR_BIT is >= 16, architecture endianness is irrelevant since code units are a single byte.

UTF16

> o Wide literal encoding is UTF-16LE, sizeof(wchar_t) is 1, CHAR_BIT is >= 8, architecture endianness is irrelevant since code units are a single byte.

That was a bit terse. Ok, you mean an implementation that uses wchar_t same size as char
and puts wide literals in a sequence of byte-sized wchar_t items with UTF-16LE encoding.

Note that code units are NOT a single byte (it's UTF-16, so code units are 16 bits,
but a byte can be 8 bits in this scenario).

It feels this is a particularly non-conforming implementation, because wchar_t can't
even hold a UTF-16 code unit (which needs 16-bit for storage). I think the given
scenario is just out-of-scope for C++.

Jens

Received on 2021-10-01 15:17:09