C++ Logo

sg16

Advanced search

Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Peter Dimov <pdimov_at_[hidden]>
Date: Sat, 4 May 2024 03:42:03 +0300
Tom Honermann wrote:
> We can deduce the following:
>
> 1. When the imbued locale is the "C" locale, the streambuf receives a
> character sequence in the ordinary literal encoding.
> 2. When the imbued locale is a different encoding, the streambuf receives
> a character sequence in the locale dependent encoding.
>
> The second case requires that literals written to the stream use only characters
> that have consistent representation in the locale dependent encoding in order
> to avoid mojibake.

I see what you are saying, but I don't think this is what we want to support
going forward.

You are saying that (assuming narrow literal encoding UTF-8) this doesn't work

std::cout << std::chrono::August << "に" << std::endl;

when LC_TIME=ja_JP.sjis, but we can hypothetically make this work

std::cout << std::chrono::August << u8"に" << std::endl;

by having the ostream transcode the UTF-8 literal into Shift-JIS.

I don't think we should do that. I think that these two statements, when the
narrow literal encoding is UTF-8, must do the exact same thing.

And so should these two:

std::wcout << std::chrono::August << "に" << std::endl;
std::wcout << std::chrono::August << u8"に" << std::endl;

I don't believe using the locale encoding for the intermediate representation
of the character sequences passed to the streambuf is sound, and I don't think
trying to support this case will lead us anywhere useful.

The future we want is narrow literal encoding of UTF-8, with the streambuf
receiving character sequences in UTF-8, with the final encoding produced by
the codecvt facet in the streambuf.

The locale categories in that future determine the month names, but not
their encoding.

I don't quite know how we get there, but I'm pretty sure transcoding UTF-8
to Shift-JIS in the inserters isn't how.

Received on 2024-05-04 00:42:06