Date: Sat, 4 May 2024 10:44:08 +0200
On Sat, May 4, 2024 at 2:42 AM Peter Dimov <pdimov_at_[hidden]> wrote:
> Tom Honermann wrote:
> > We can deduce the following:
> >
> > 1. When the imbued locale is the "C" locale, the streambuf receives a
> > character sequence in the ordinary literal encoding.
> > 2. When the imbued locale is a different encoding, the streambuf
> receives
> > a character sequence in the locale dependent encoding.
> >
> > The second case requires that literals written to the stream use only
> characters
> > that have consistent representation in the locale dependent encoding in
> order
> > to avoid mojibake.
>
> I see what you are saying, but I don't think this is what we want to
> support
> going forward.
>
> You are saying that (assuming narrow literal encoding UTF-8) this doesn't
> work
>
> std::cout << std::chrono::August << "に" << std::endl;
>
> when LC_TIME=ja_JP.sjis, but we can hypothetically make this work
>
> std::cout << std::chrono::August << u8"に" << std::endl;
>
> by having the ostream transcode the UTF-8 literal into Shift-JIS.
>
> I don't think we should do that. I think that these two statements, when
> the
> narrow literal encoding is UTF-8, must do the exact same thing.
>
> And so should these two:
>
> std::wcout << std::chrono::August << "に" << std::endl;
> std::wcout << std::chrono::August << u8"に" << std::endl;
>
Right, a u8 string should behave exactly like the equivalent UTF-8 encoded
narrow string literal.
> I don't believe using the locale encoding for the intermediate
> representation
> of the character sequences passed to the streambuf is sound, and I don't
> think
> trying to support this case will lead us anywhere useful.
>
> The future we want is narrow literal encoding of UTF-8, with the streambuf
> receiving character sequences in UTF-8, with the final encoding produced by
> the codecvt facet in the streambuf.
>
> The locale categories in that future determine the month names, but not
> their encoding.
>
+1
>
> I don't quite know how we get there, but I'm pretty sure transcoding UTF-8
> to Shift-JIS in the inserters isn't how.
>
>
>
> Tom Honermann wrote:
> > We can deduce the following:
> >
> > 1. When the imbued locale is the "C" locale, the streambuf receives a
> > character sequence in the ordinary literal encoding.
> > 2. When the imbued locale is a different encoding, the streambuf
> receives
> > a character sequence in the locale dependent encoding.
> >
> > The second case requires that literals written to the stream use only
> characters
> > that have consistent representation in the locale dependent encoding in
> order
> > to avoid mojibake.
>
> I see what you are saying, but I don't think this is what we want to
> support
> going forward.
>
> You are saying that (assuming narrow literal encoding UTF-8) this doesn't
> work
>
> std::cout << std::chrono::August << "に" << std::endl;
>
> when LC_TIME=ja_JP.sjis, but we can hypothetically make this work
>
> std::cout << std::chrono::August << u8"に" << std::endl;
>
> by having the ostream transcode the UTF-8 literal into Shift-JIS.
>
> I don't think we should do that. I think that these two statements, when
> the
> narrow literal encoding is UTF-8, must do the exact same thing.
>
> And so should these two:
>
> std::wcout << std::chrono::August << "に" << std::endl;
> std::wcout << std::chrono::August << u8"に" << std::endl;
>
Right, a u8 string should behave exactly like the equivalent UTF-8 encoded
narrow string literal.
> I don't believe using the locale encoding for the intermediate
> representation
> of the character sequences passed to the streambuf is sound, and I don't
> think
> trying to support this case will lead us anywhere useful.
>
> The future we want is narrow literal encoding of UTF-8, with the streambuf
> receiving character sequences in UTF-8, with the final encoding produced by
> the codecvt facet in the streambuf.
>
> The locale categories in that future determine the month names, but not
> their encoding.
>
+1
>
> I don't quite know how we get there, but I'm pretty sure transcoding UTF-8
> to Shift-JIS in the inserters isn't how.
>
>
>
Received on 2024-05-04 08:44:28