Date: Sat, 4 May 2024 03:42:03 +0300
Tom Honermann wrote:
> We can deduce the following:
>
> 1. When the imbued locale is the "C" locale, the streambuf receives a
> character sequence in the ordinary literal encoding.
> 2. When the imbued locale is a different encoding, the streambuf receives
> a character sequence in the locale dependent encoding.
>
> The second case requires that literals written to the stream use only characters
> that have consistent representation in the locale dependent encoding in order
> to avoid mojibake.
I see what you are saying, but I don't think this is what we want to support
going forward.
You are saying that (assuming narrow literal encoding UTF-8) this doesn't work
std::cout << std::chrono::August << "に" << std::endl;
when LC_TIME=ja_JP.sjis, but we can hypothetically make this work
std::cout << std::chrono::August << u8"に" << std::endl;
by having the ostream transcode the UTF-8 literal into Shift-JIS.
I don't think we should do that. I think that these two statements, when the
narrow literal encoding is UTF-8, must do the exact same thing.
And so should these two:
std::wcout << std::chrono::August << "に" << std::endl;
std::wcout << std::chrono::August << u8"に" << std::endl;
I don't believe using the locale encoding for the intermediate representation
of the character sequences passed to the streambuf is sound, and I don't think
trying to support this case will lead us anywhere useful.
The future we want is narrow literal encoding of UTF-8, with the streambuf
receiving character sequences in UTF-8, with the final encoding produced by
the codecvt facet in the streambuf.
The locale categories in that future determine the month names, but not
their encoding.
I don't quite know how we get there, but I'm pretty sure transcoding UTF-8
to Shift-JIS in the inserters isn't how.
> We can deduce the following:
>
> 1. When the imbued locale is the "C" locale, the streambuf receives a
> character sequence in the ordinary literal encoding.
> 2. When the imbued locale is a different encoding, the streambuf receives
> a character sequence in the locale dependent encoding.
>
> The second case requires that literals written to the stream use only characters
> that have consistent representation in the locale dependent encoding in order
> to avoid mojibake.
I see what you are saying, but I don't think this is what we want to support
going forward.
You are saying that (assuming narrow literal encoding UTF-8) this doesn't work
std::cout << std::chrono::August << "に" << std::endl;
when LC_TIME=ja_JP.sjis, but we can hypothetically make this work
std::cout << std::chrono::August << u8"に" << std::endl;
by having the ostream transcode the UTF-8 literal into Shift-JIS.
I don't think we should do that. I think that these two statements, when the
narrow literal encoding is UTF-8, must do the exact same thing.
And so should these two:
std::wcout << std::chrono::August << "に" << std::endl;
std::wcout << std::chrono::August << u8"に" << std::endl;
I don't believe using the locale encoding for the intermediate representation
of the character sequences passed to the streambuf is sound, and I don't think
trying to support this case will lead us anywhere useful.
The future we want is narrow literal encoding of UTF-8, with the streambuf
receiving character sequences in UTF-8, with the final encoding produced by
the codecvt facet in the streambuf.
The locale categories in that future determine the month names, but not
their encoding.
I don't quite know how we get there, but I'm pretty sure transcoding UTF-8
to Shift-JIS in the inserters isn't how.
Received on 2024-05-04 00:42:06