C++ Logo

sg16

Advanced search

Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 29 Apr 2024 18:45:56 -0400
On 4/29/24 4:11 PM, Peter Dimov via SG16 wrote:
> Tom Honermann wrote:
>> I'm not entirely sure that cout << std::format("{}", u8"...") is that much
>> easier
>> to specify and support.
>>
>> But I'll be glad to be proven wrong, of course. :-)
>>
>> There is a relevant SO comment
>> <https://stackoverflow.com/questions/58878651/what-is-the-printf-
>> formatting-character-for-char8-t/58895428#58895428> .
>>
>> std::format() and std::print(), to some extent, improve the likelihood that an
>> implementation selected encoding will be a good match for the programmer's
>> intent. This is because:
>>
>> 1. std::format() and std::print() are not implicitly locale dependent; that
>> rules out selection of a locale dependent execution encoding.
>> 2. std::format() returns a std::string; that rules out selection of an I/O
>> dependent encoding.
>> 3. std::print() writes to an I/O stream, but has special behavior for writes
>> to a terminal; that rules out selection of a terminal encoding (as unnecessary,
>> at least in important cases).
>> 4. std::format() and std::print() are both strongly associated with the
>> ordinary/wide literal encoding.
>> 5. std::format() and std::print() should have the same behavior (other than
>> that std::print(...) may produce a better result than std::cout <<
>> std::format(...) when the output is directed to a terminal).
>> 6. std::format() and std::print() have additional guarantees when the
>> ordinary/wide literal encoding is a UTF encoding.
>>
>>
>> Due to those characteristics, we have good motivation for implicit use of the
>> ordinary/wide literal encoding as the target for transcoding for std::format()
>> and std::print().
> I'm afraid that I don't quite understand.
>
> What does std::format( "{}", u8"..." ) actually do? I suppose it transcodes
> the UTF-8 input into the narrow literal encoding (replacing irrepresentable
> characters with '?' instead of throwing, I presume, or it would be not very
> usable)?

We'll have to see what Corentin proposes :)

But yes, something very much like that.

Note that we could also support std::format("{:L}", u8"...") to enable a
programmer to explicitly request transcoding to a locale dependent
encoding (either now or at some future point).

(Corentin, at a minimum, we should reserve the L option in your paper).

>
> And then we just fall back to std::cout << "...", where the "..." is in the
> narrow literal encoding and hence we assume works, more or less.
Correct.
>
> And we don't want to make std::cout << u8"..." do that, because it can,
> in principle, do better?
Not because it can do better, but because there is more uncertainty
about what the user might expect. If the user writes std::cout <<
std::format(...), then that is an explicit opt in to the behavior that
std::format() exhibits. But they might also want to just write UTF-8
bytes unmodified regardless of what the ordinary literal encoding is. Or
they might expect implicit transcoding to either the current locale or
the environment locale or even the terminal locale. By not providing a
default behavior, we give the programmer the opportunity to think about
what they are actually trying to do.
>
> But let me get back to your list.
>
>> 1. std::format() and std::print() are not implicitly locale dependent; that
>> rules out selection of a locale dependent execution encoding.
> What is in a locale-dependent execution encoding in std::cout << u8"..."?
iostreams implicitly consults either an imbued locale facet or the
global locale for formatting operations. Think about std::cout <<
std::chrono::Sunday. Depending on the current locale, this might print
"Sun" or a localized weekday name in a locale dependent encoding.
>
>> 2. std::format() returns a std::string; that rules out selection of an I/O
>> dependent encoding.
> Same question. Where is the I/O dependent encoding in std::cout << u8"..."
> (that is not also present in std::cout << some_std_string)?
In the latter case, we have to assume that some_std_string holds text in
the encoding expected on the other end of the stream. We can't do that
for u8"...", so we have to transcode to something (or have some other
assurance that UTF-8 is intended and expected).
>
>> 3. std::print() writes to an I/O stream, but has special behavior for writes
>> to a terminal; that rules out selection of a terminal encoding (as unnecessary,
>> at least in important cases).
> This doesn't apply here, because we're using std::format.
>
>> 5. std::format() and std::print() should have the same behavior (other than
>> that std::print(...) may produce a better result than std::cout <<
>> std::format(...) when the output is directed to a terminal).
> OK... but this isn't relevant.
The above two are relevant because we wouldn't want to differentiate
behavior for formatting a u8"..." argument for std::format() vs
std::print(). The latter helps to constrain the reasonable options for
the former.
>
>> 6. std::format() and std::print() have additional guarantees when the
>> ordinary/wide literal encoding is a UTF encoding.
> What additional guarantees, and how do they help here?

We specify additional constraints for fill characters, display width
(well, normative encouragement), and formatting of escaped strings. None
of these are relevant for reflection purposes; they help to reinforce a
choice to depend on the ordinary/wide literal encoding for behavior of
these functions. We don't have such precedent for iostreams.

Tom.

Received on 2024-04-29 22:45:58