C++ Logo

sg16

Advanced search

Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Peter Dimov <pdimov_at_[hidden]>
Date: Tue, 30 Apr 2024 13:17:13 +0300
Jens Maurer wrote:
> Hm... We currently specify that std::fstream considers the imbued locale's
> encoding, but we seem to say nothing about std::cout. Even though one might
> reasonably expect that it also considers the imbued locale to perform
> transcoding to the output.

It's a bit vague but std::cout is https://eel.is/c++draft/narrow.stream.objects#3

"The object cout controls output to a stream buffer associated with the object
stdout, declared in <cstdio>."

which strongly implies a `filebuf` that writes to `stdout`, even though it's not
required to be literally that and can be e.g. of type __stdout_streambuf.

std::cout is actually the easy case. std::cout << x, for any x, must serialize x
into a sequence of `char`, which then to pass to its streambuf; the streambuf
uses codecvt::out to transcode, but codecvt<char, char, mbstate_t> is a no-op.

So in the "normal" case of nothing imbued, and from the fact that

    std::cout << "Hello, world!" << std::endl;

is expected to work, we can deduce that characters in the literal encoding
end up in the streambuf and then are written to stdout, with no translation.

And since in

    std::cout << "Hello, " << u8"world!" << std::endl;

the characters "Hello, " and the result of serialization of u8"world!" to char[]
end up in the same char[] buffer, with no associated metadata to tell the
streambuf which specific `char` is in what encoding, we can further deduce
that the serialized u8"world!" has to consist of characters in the literal
encoding (or a superset of it.)

There's simply no other option.

std::wcout << L"Привет!" << std::endl (where the wide literal encoding is
UTF-16, but the narrow literal encoding is ISO-8859-1) is the hard case. But
I think we've given up on that.

The hypothetical u8cout (whose streambuf is basic_streambuf<char8_t>)
will of course do the opposite, pass through u8"..." and transcode "...", but
we can worry about that when we get it, which will be never.

TL;DR

std::cout << "prefix " << u8"..." << " suffix\n";

and

std::cout << std::format("prefix {} suffix\n", u8"...");

are equivalent, and the same reasoning applies to both. In both cases,
the narrow literals and the u8 literal are serialized to a single char[]
buffer, in the narrow literal encoding.

And once that is done, the imbued locale, if any, is applied to both in
the exact same manner.

Received on 2024-04-30 10:17:19