C++ Logo

sg16

Advanced search

Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Tue, 30 Apr 2024 14:45:35 +0200
On 30/04/2024 12.17, Peter Dimov wrote:
> Jens Maurer wrote:
>> Hm... We currently specify that std::fstream considers the imbued locale's
>> encoding, but we seem to say nothing about std::cout. Even though one might
>> reasonably expect that it also considers the imbued locale to perform
>> transcoding to the output.
>
> It's a bit vague but std::cout is https://eel.is/c++draft/narrow.stream.objects#3
>
> "The object cout controls output to a stream buffer associated with the object
> stdout, declared in <cstdio>."
>
> which strongly implies a `filebuf` that writes to `stdout`, even though it's not
> required to be literally that and can be e.g. of type __stdout_streambuf.

A "filebuf" is not a std::fstream, and a "stream buffer" certainly is not
necessarily a streambuf for a file.

> std::cout is actually the easy case. std::cout << x, for any x, must serialize x
> into a sequence of `char`, which then to pass to its streambuf; the streambuf
> uses codecvt::out to transcode, but codecvt<char, char, mbstate_t> is a no-op.

Where do we say this part in the standard, outside of the std::fstream
specification?

"the streambuf uses codecvt::out to transcode"

> So in the "normal" case of nothing imbued, and from the fact that
>
> std::cout << "Hello, world!" << std::endl;
>
> is expected to work, we can deduce that characters in the literal encoding
> end up in the streambuf and then are written to stdout, with no translation.
>
> And since in
>
> std::cout << "Hello, " << u8"world!" << std::endl;
>
> the characters "Hello, " and the result of serialization of u8"world!" to char[]
> end up in the same char[] buffer, with no associated metadata to tell the
> streambuf which specific `char` is in what encoding, we can further deduce
> that the serialized u8"world!" has to consist of characters in the literal
> encoding (or a superset of it.)

That's at least true for a std::fstream, I think.
I don't think std::cout does any (std::locale-dependent) transcoding at all.

> There's simply no other option.

> TL;DR
>
> std::cout << "prefix " << u8"..." << " suffix\n";
>
> and
>
> std::cout << std::format("prefix {} suffix\n", u8"...");
>
> are equivalent, and the same reasoning applies to both. In both cases,
> the narrow literals and the u8 literal are serialized to a single char[]
> buffer, in the narrow literal encoding.

I like that outcome.

Jens

Jens

Received on 2024-04-30 12:45:50