C++ Logo

sg16

Advanced search

Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Tue, 30 Apr 2024 08:31:21 +0200
On Tue, Apr 30, 2024 at 12:45 AM Tom Honermann <tom_at_[hidden]> wrote:

> On 4/29/24 4:11 PM, Peter Dimov via SG16 wrote:
> > Tom Honermann wrote:
> >> I'm not entirely sure that cout << std::format("{}", u8"...") is
> that much
> >> easier
> >> to specify and support.
> >>
> >> But I'll be glad to be proven wrong, of course. :-)
> >>
> >> There is a relevant SO comment
> >> <https://stackoverflow.com/questions/58878651/what-is-the-printf-
> >> formatting-character-for-char8-t/58895428#58895428> .
> >>
> >> std::format() and std::print(), to some extent, improve the likelihood
> that an
> >> implementation selected encoding will be a good match for the
> programmer's
> >> intent. This is because:
> >>
> >> 1. std::format() and std::print() are not implicitly locale
> dependent; that
> >> rules out selection of a locale dependent execution encoding.
> >> 2. std::format() returns a std::string; that rules out selection of
> an I/O
> >> dependent encoding.
> >> 3. std::print() writes to an I/O stream, but has special behavior for
> writes
> >> to a terminal; that rules out selection of a terminal encoding (as
> unnecessary,
> >> at least in important cases).
> >> 4. std::format() and std::print() are both strongly associated with
> the
> >> ordinary/wide literal encoding.
> >> 5. std::format() and std::print() should have the same behavior
> (other than
> >> that std::print(...) may produce a better result than std::cout <<
> >> std::format(...) when the output is directed to a terminal).
> >> 6. std::format() and std::print() have additional guarantees when the
> >> ordinary/wide literal encoding is a UTF encoding.
> >>
> >>
> >> Due to those characteristics, we have good motivation for implicit use
> of the
> >> ordinary/wide literal encoding as the target for transcoding for
> std::format()
> >> and std::print().
> > I'm afraid that I don't quite understand.
> >
> > What does std::format( "{}", u8"..." ) actually do? I suppose it
> transcodes
> > the UTF-8 input into the narrow literal encoding (replacing
> irrepresentable
> > characters with '?' instead of throwing, I presume, or it would be not
> very
> > usable)?
>
> We'll have to see what Corentin proposes :)
>
> But yes, something very much like that.
>
> Note that we could also support std::format("{:L}", u8"...") to enable a
> programmer to explicitly request transcoding to a locale dependent
> encoding (either now or at some future point).
>
> (Corentin, at a minimum, we should reserve the L option in your paper).
>

We have an opportunity to not conflate locale and encodings here.
u8"" is a known quantity here, it's utf-8.
But the target is also a known quantity, we very clearly decided it to be
the literal encoding, because we need to parse it, and
we wisely decided to assume a literal encoding. So the target encoding is
also a known quantity




>
> >
> > And then we just fall back to std::cout << "...", where the "..." is in
> the
> > narrow literal encoding and hence we assume works, more or less.
> Correct.
> >
> > And we don't want to make std::cout << u8"..." do that, because it can,
> > in principle, do better?
> Not because it can do better, but because there is more uncertainty
> about what the user might expect. If the user writes std::cout <<
> std::format(...), then that is an explicit opt in to the behavior that
> std::format() exhibits. But they might also want to just write UTF-8
> bytes unmodified regardless of what the ordinary literal encoding is. Or
> they might expect implicit transcoding to either the current locale or
> the environment locale or even the terminal locale. By not providing a
> default behavior, we give the programmer the opportunity to think about
> what they are actually trying to do.
>

I don't quite buy this argument.
When cout << 42.0; outputs "42,0", the text nature, locale and encodings
were made for us.
If the programmer wants to be creative, one can consider io manipulators.


> >
> > But let me get back to your list.
> >
> >> 1. std::format() and std::print() are not implicitly locale
> dependent; that
> >> rules out selection of a locale dependent execution encoding.
> > What is in a locale-dependent execution encoding in std::cout << u8"..."?
> iostreams implicitly consults either an imbued locale facet or the
> global locale for formatting operations. Think about std::cout <<
> std::chrono::Sunday. Depending on the current locale, this might print
> "Sun" or a localized weekday name in a locale dependent encoding.
>

But again, the only thing we care about for u8 is the encoding.
And I am not aware of std::locale ever impacting that.


> >
> >> 2. std::format() returns a std::string; that rules out selection of
> an I/O
> >> dependent encoding.
> > Same question. Where is the I/O dependent encoding in std::cout <<
> u8"..."
> > (that is not also present in std::cout << some_std_string)?
> In the latter case, we have to assume that some_std_string holds text in
> the encoding expected on the other end of the stream. We can't do that
> for u8"...", so we have to transcode to something (or have some other
> assurance that UTF-8 is intended and expected).
> >
> >> 3. std::print() writes to an I/O stream, but has special behavior for
> writes
> >> to a terminal; that rules out selection of a terminal encoding (as
> unnecessary,
> >> at least in important cases).

> This doesn't apply here, because we're using std::format.
>

Right, this is one of the reasons I feel less compelled to pursue iostream
surgery.
Output behavior is suboptimal on windows, and unlikely to be fixed.


> >> 5. std::format() and std::print() should have the same behavior
> (other than
> >> that std::print(...) may produce a better result than std::cout <<
> >> std::format(...) when the output is directed to a terminal).
> > OK... but this isn't relevant.
> The above two are relevant because we wouldn't want to differentiate
> behavior for formatting a u8"..." argument for std::format() vs
> std::print(). The latter helps to constrain the reasonable options for
> the former.


Right, print just does format and output the result


> >
> >> 6. std::format() and std::print() have additional guarantees when the
> >> ordinary/wide literal encoding is a UTF encoding.
> > What additional guarantees, and how do they help here?
>
> We specify additional constraints for fill characters, display width
> (well, normative encouragement), and formatting of escaped strings. None
> of these are relevant for reflection purposes; they help to reinforce a
> choice to depend on the ordinary/wide literal encoding for behavior of
> these functions. We don't have such precedent for iostreams.
>

And you know, the format string is parsed in the ordinary encoding and
copied as-it


>
> Tom.
>
>

Received on 2024-04-30 06:31:41