On Fri, Dec 3, 2021 at 5:48 PM Jens Maurer <Jens.Maurer@gmx.net> wrote:
On 03/12/2021 22.58, Hubert Tong wrote:
> On Fri, Dec 3, 2021 at 4:55 PM Jens Maurer via SG16 <sg16@lists.isocpp.org <mailto:sg16@lists.isocpp.org>> wrote:
>
>     On 03/12/2021 22.03, Tom Honermann wrote:
>     > On 12/1/21 2:28 PM, Corentin Jabot wrote:
>
>     >> I think Jens is right. MSVC does handle Shift-JIS specifically but I'm not sure we can/should mandate something that work universally, the burden on implementation could be high)
>     >
>     > Are you suggesting that we should revisit the consensus for the proposed resolution for LWG3576 <https://cplusplus.github.io/LWG/issue3576 <https://cplusplus.github.io/LWG/issue3576>> from our 2021-08-25 telecon <https://github.com/sg16-unicode/sg16-meetings#august-25th-2021 <https://github.com/sg16-unicode/sg16-meetings#august-25th-2021>>?
>
>     Reading https://cplusplus.github.io/LWG/issue3576 <https://cplusplus.github.io/LWG/issue3576>
>     right now (I wasn't present in August, it  seems),
>     this says
>
>     "any codepoint of the literal encoding other than { or }"
>
>     This seems to be a category error: A literal encoding produces
>     code units (see [lex.string]), not code points.
>
>     One could certainly endeavor to reconstruct code points from
>     code units, but it appears some encodings don't really have
>     a code point space to start with.  For example, wide-EBCDIC
>     paired with some narrow EBCDIC shifts between the two, but
>     it seems there is no single "code point" space that would
>     contain values for characters from both sets.
>
>
> I believe the numeric value of a wchar_t would serve as the "code point" space in such a case.

Is there actually a wchar_t encoding corresponding to each (char-based)
shift-state encoding?

Various 2-byte wchar_t encodings have an issue with representing all of the multibyte characters, yes. However, I thought your question was about an abstract code point space. The EBCDIC multibyte character sets work fine with 2-byte wchar_t types.
 

A Unicode counterexample:
If we use UTF-8 for char and UTF-16 for wchar_t, neither encoding directly
provides code point values.

Yes. I think either the wording should say code unit or multibyte character.
 

Jens