C++ Logo

sg16

Advanced search

Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Tue, 30 Apr 2024 07:24:49 +0200
On 30/04/2024 01.08, Peter Dimov via SG16 wrote:
> Tom Honermann wrote:
>>> And we don't want to make std::cout << u8"..." do that, because it
>>> can, in principle, do better?
>
>> Not because it can do better, but because there is more uncertainty about
>> what the user might expect. If the user writes std::cout << std::format(...),
>> then that is an explicit opt in to the behavior that
>> std::format() exhibits. But they might also want to just write UTF-8 bytes
>> unmodified regardless of what the ordinary literal encoding is. Or they might
>> expect implicit transcoding to either the current locale or the environment
>> locale or even the terminal locale. By not providing a default behavior, we give
>> the programmer the opportunity to think about what they are actually trying
>> to do.
>
> I'm not sure I buy all that. Once format() returns, we are left with a string
> in the literal encoding. That string goes to std::cout. There's not much
> difference between sending a string in the literal encoding to std::cout,
> and sending a string in UTF-8 to std::cout, especially when the literal encoding
> is UTF-8, but also in principle.
>
> Namely,
>
>> iostreams implicitly consults either an imbued locale facet or the global locale
>> for formatting operations.
>
> this remains true for either of our string encodings. There's absolutely no
> guarantee that the imbued locale facet is more suitable for outputting the
> literal encoding than it's for outputting UTF-8. In fact it may very well be less
> suitable.

std::cout is just a special case of general iostreams; what we discuss here
should work for any ostream.

>> In the latter case, we have to assume that some_std_string holds text in the
>> encoding expected on the other end of the stream.
>
> Again, I don't see why that would be true. If you are going to invoke CP437
> in the UTF-8 case, I don't see why we suddenly need to ignore its existence
> in the literal encoding case.

Hm... We currently specify that std::fstream considers the imbued locale's
encoding, but we seem to say nothing about std::cout. Even though one might
reasonably expect that it also considers the imbued locale to perform
transcoding to the output.

> There's nothing stopping us from making std::cout << u8"..." _at least as
> good as_ std::cout << std::format( "{}", u8"..." ) - we just make it transcode
> to the literal encoding.

Fully agreed, and probably a small wording amendment.
(Put differently, std::cout already comes with an expectation of
the encoding it assumes for "char *" output. Converting char8_t
strings to that encoding is all we need.)

Jens

> Yes, it's potentially possible to do better than that,
> but it needn't be any worse, and in the common case of the literal
> encoding being UTF-8, both will be as good as can be achieved.
>
> Now, had the proposal on the table been std::print( u8"{}", u8"..." )... that's
> another story altogether. But we aren't talking about that.
>
>

Received on 2024-04-30 05:25:07