Date: Tue, 30 Apr 2024 13:17:13 +0300
Jens Maurer wrote:
> Hm... We currently specify that std::fstream considers the imbued locale's
> encoding, but we seem to say nothing about std::cout. Even though one might
> reasonably expect that it also considers the imbued locale to perform
> transcoding to the output.
It's a bit vague but std::cout is https://eel.is/c++draft/narrow.stream.objects#3
"The object cout controls output to a stream buffer associated with the object
stdout, declared in <cstdio>."
which strongly implies a `filebuf` that writes to `stdout`, even though it's not
required to be literally that and can be e.g. of type __stdout_streambuf.
std::cout is actually the easy case. std::cout << x, for any x, must serialize x
into a sequence of `char`, which then to pass to its streambuf; the streambuf
uses codecvt::out to transcode, but codecvt<char, char, mbstate_t> is a no-op.
So in the "normal" case of nothing imbued, and from the fact that
std::cout << "Hello, world!" << std::endl;
is expected to work, we can deduce that characters in the literal encoding
end up in the streambuf and then are written to stdout, with no translation.
And since in
std::cout << "Hello, " << u8"world!" << std::endl;
the characters "Hello, " and the result of serialization of u8"world!" to char[]
end up in the same char[] buffer, with no associated metadata to tell the
streambuf which specific `char` is in what encoding, we can further deduce
that the serialized u8"world!" has to consist of characters in the literal
encoding (or a superset of it.)
There's simply no other option.
std::wcout << L"Привет!" << std::endl (where the wide literal encoding is
UTF-16, but the narrow literal encoding is ISO-8859-1) is the hard case. But
I think we've given up on that.
The hypothetical u8cout (whose streambuf is basic_streambuf<char8_t>)
will of course do the opposite, pass through u8"..." and transcode "...", but
we can worry about that when we get it, which will be never.
TL;DR
std::cout << "prefix " << u8"..." << " suffix\n";
and
std::cout << std::format("prefix {} suffix\n", u8"...");
are equivalent, and the same reasoning applies to both. In both cases,
the narrow literals and the u8 literal are serialized to a single char[]
buffer, in the narrow literal encoding.
And once that is done, the imbued locale, if any, is applied to both in
the exact same manner.
> Hm... We currently specify that std::fstream considers the imbued locale's
> encoding, but we seem to say nothing about std::cout. Even though one might
> reasonably expect that it also considers the imbued locale to perform
> transcoding to the output.
It's a bit vague but std::cout is https://eel.is/c++draft/narrow.stream.objects#3
"The object cout controls output to a stream buffer associated with the object
stdout, declared in <cstdio>."
which strongly implies a `filebuf` that writes to `stdout`, even though it's not
required to be literally that and can be e.g. of type __stdout_streambuf.
std::cout is actually the easy case. std::cout << x, for any x, must serialize x
into a sequence of `char`, which then to pass to its streambuf; the streambuf
uses codecvt::out to transcode, but codecvt<char, char, mbstate_t> is a no-op.
So in the "normal" case of nothing imbued, and from the fact that
std::cout << "Hello, world!" << std::endl;
is expected to work, we can deduce that characters in the literal encoding
end up in the streambuf and then are written to stdout, with no translation.
And since in
std::cout << "Hello, " << u8"world!" << std::endl;
the characters "Hello, " and the result of serialization of u8"world!" to char[]
end up in the same char[] buffer, with no associated metadata to tell the
streambuf which specific `char` is in what encoding, we can further deduce
that the serialized u8"world!" has to consist of characters in the literal
encoding (or a superset of it.)
There's simply no other option.
std::wcout << L"Привет!" << std::endl (where the wide literal encoding is
UTF-16, but the narrow literal encoding is ISO-8859-1) is the hard case. But
I think we've given up on that.
The hypothetical u8cout (whose streambuf is basic_streambuf<char8_t>)
will of course do the opposite, pass through u8"..." and transcode "...", but
we can worry about that when we get it, which will be never.
TL;DR
std::cout << "prefix " << u8"..." << " suffix\n";
and
std::cout << std::format("prefix {} suffix\n", u8"...");
are equivalent, and the same reasoning applies to both. In both cases,
the narrow literals and the u8 literal are serialized to a single char[]
buffer, in the narrow literal encoding.
And once that is done, the imbued locale, if any, is applied to both in
the exact same manner.
Received on 2024-04-30 10:17:19