C++ Logo

sg16

Advanced search

Re: [SG16] Agenda for the 2021-12-01 SG16 telecon

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 3 Dec 2021 23:48:21 +0100
On 03/12/2021 22.58, Hubert Tong wrote:
> On Fri, Dec 3, 2021 at 4:55 PM Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> On 03/12/2021 22.03, Tom Honermann wrote:
> > On 12/1/21 2:28 PM, Corentin Jabot wrote:
>
> >> I think Jens is right. MSVC does handle Shift-JIS specifically but I'm not sure we can/should mandate something that work universally, the burden on implementation could be high)
> >
> > Are you suggesting that we should revisit the consensus for the proposed resolution for LWG3576 <https://cplusplus.github.io/LWG/issue3576 <https://cplusplus.github.io/LWG/issue3576>> from our 2021-08-25 telecon <https://github.com/sg16-unicode/sg16-meetings#august-25th-2021 <https://github.com/sg16-unicode/sg16-meetings#august-25th-2021>>?
>
> Reading https://cplusplus.github.io/LWG/issue3576 <https://cplusplus.github.io/LWG/issue3576>
> right now (I wasn't present in August, it seems),
> this says
>
> "any codepoint of the literal encoding other than { or }"
>
> This seems to be a category error: A literal encoding produces
> code units (see [lex.string]), not code points.
>
> One could certainly endeavor to reconstruct code points from
> code units, but it appears some encodings don't really have
> a code point space to start with. For example, wide-EBCDIC
> paired with some narrow EBCDIC shifts between the two, but
> it seems there is no single "code point" space that would
> contain values for characters from both sets.
>
>
> I believe the numeric value of a wchar_t would serve as the "code point" space in such a case.

Is there actually a wchar_t encoding corresponding to each (char-based)
shift-state encoding?

A Unicode counterexample:
If we use UTF-8 for char and UTF-16 for wchar_t, neither encoding directly
provides code point values.

Jens

Received on 2021-12-03 16:48:28