On Sat, May 4, 2024 at 2:42 AM Peter Dimov <pdimov@gmail.com> wrote:
Tom Honermann wrote:
> We can deduce the following:
>
> 1.    When the imbued locale is the "C" locale, the streambuf receives a
> character sequence in the ordinary literal encoding.
> 2.    When the imbued locale is a different encoding, the streambuf receives
> a character sequence in the locale dependent encoding.
>
> The second case requires that literals written to the stream use only characters
> that have consistent representation in the locale dependent encoding in order
> to avoid mojibake.

I see what you are saying, but I don't think this is what we want to support
going forward.

You are saying that (assuming narrow literal encoding UTF-8) this doesn't work

std::cout << std::chrono::August << "に" << std::endl;

when LC_TIME=ja_JP.sjis, but we can hypothetically make this work

std::cout << std::chrono::August << u8"に" << std::endl;

by having the ostream transcode the UTF-8 literal into Shift-JIS.

I don't think we should do that. I think that these two statements, when the
narrow literal encoding is UTF-8, must do the exact same thing.

And so should these two:

std::wcout << std::chrono::August << "に" << std::endl;
std::wcout << std::chrono::August << u8"に" << std::endl;

Right, a u8 string should behave exactly like the equivalent UTF-8 encoded narrow string literal.



I don't believe using the locale encoding for the intermediate representation
of the character sequences passed to the streambuf is sound, and I don't think
trying to support this case will lead us anywhere useful.

The future we want is narrow literal encoding of UTF-8, with the streambuf
receiving character sequences in UTF-8, with the final encoding produced by
the codecvt facet in the streambuf.

The locale categories in that future determine the month names, but not
their encoding.

+1
 

I don't quite know how we get there, but I'm pretty sure transcoding UTF-8
to Shift-JIS in the inserters isn't how.