On Fri, Dec 3, 2021 at 5:48 PM Jens Maurer <Jens.Maurer@gmx.net> wrote:

On 03/12/2021 22.58, Hubert Tong wrote:
> On Fri, Dec 3, 2021 at 4:55 PM Jens Maurer via SG16 <sg16@lists.isocpp.org <mailto:sg16@lists.isocpp.org>> wrote:
>
> On 03/12/2021 22.03, Tom Honermann wrote:
> > On 12/1/21 2:28 PM, Corentin Jabot wrote:
>
> >> I think Jens is right. MSVC does handle Shift-JIS specifically but I'm not sure we can/should mandate something that work universally, the burden on implementation could be high)
> >
> > Are you suggesting that we should revisit the consensus for the proposed resolution for LWG3576 <https://cplusplus.github.io/LWG/issue3576 <https://cplusplus.github.io/LWG/issue3576>> from our 2021-08-25 telecon <https://github.com/sg16-unicode/sg16-meetings#august-25th-2021 <https://github.com/sg16-unicode/sg16-meetings#august-25th-2021>>?
>
> Reading https://cplusplus.github.io/LWG/issue3576 <https://cplusplus.github.io/LWG/issue3576>
> right now (I wasn't present in August, it seems),
> this says
>
> "any codepoint of the literal encoding other than { or }"
>
> This seems to be a category error: A literal encoding produces
> code units (see [lex.string]), not code points.
>
> One could certainly endeavor to reconstruct code points from
> code units, but it appears some encodings don't really have
> a code point space to start with. For example, wide-EBCDIC
> paired with some narrow EBCDIC shifts between the two, but
> it seems there is no single "code point" space that would
> contain values for characters from both sets.
>
>
> I believe the numeric value of a wchar_t would serve as the "code point" space in such a case.

Is there actually a wchar_t encoding corresponding to each (char-based)
shift-state encoding?

Various 2-byte wchar_t encodings have an issue with representing all of the multibyte characters, yes. However, I thought your question was about an abstract code point space. The EBCDIC multibyte character sets work fine with 2-byte wchar_t types.

A Unicode counterexample:
If we use UTF-8 for char and UTF-16 for wchar_t, neither encoding directly
provides code point values.

Yes. I think either the wording should say code unit or multibyte character.

Jens