Date: Tue, 30 Apr 2024 02:08:33 +0300
Tom Honermann wrote:
> > And we don't want to make std::cout << u8"..." do that, because it
> > can, in principle, do better?
> Not because it can do better, but because there is more uncertainty about
> what the user might expect. If the user writes std::cout << std::format(...),
> then that is an explicit opt in to the behavior that
> std::format() exhibits. But they might also want to just write UTF-8 bytes
> unmodified regardless of what the ordinary literal encoding is. Or they might
> expect implicit transcoding to either the current locale or the environment
> locale or even the terminal locale. By not providing a default behavior, we give
> the programmer the opportunity to think about what they are actually trying
> to do.
I'm not sure I buy all that. Once format() returns, we are left with a string
in the literal encoding. That string goes to std::cout. There's not much
difference between sending a string in the literal encoding to std::cout,
and sending a string in UTF-8 to std::cout, especially when the literal encoding
is UTF-8, but also in principle.
Namely,
> iostreams implicitly consults either an imbued locale facet or the global locale
> for formatting operations.
this remains true for either of our string encodings. There's absolutely no
guarantee that the imbued locale facet is more suitable for outputting the
literal encoding than it's for outputting UTF-8. In fact it may very well be less
suitable.
> In the latter case, we have to assume that some_std_string holds text in the
> encoding expected on the other end of the stream.
Again, I don't see why that would be true. If you are going to invoke CP437
in the UTF-8 case, I don't see why we suddenly need to ignore its existence
in the literal encoding case.
There's nothing stopping us from making std::cout << u8"..." _at least as
good as_ std::cout << std::format( "{}", u8"..." ) - we just make it transcode
to the literal encoding. Yes, it's potentially possible to do better than that,
but it needn't be any worse, and in the common case of the literal
encoding being UTF-8, both will be as good as can be achieved.
Now, had the proposal on the table been std::print( u8"{}", u8"..." )... that's
another story altogether. But we aren't talking about that.
> > And we don't want to make std::cout << u8"..." do that, because it
> > can, in principle, do better?
> Not because it can do better, but because there is more uncertainty about
> what the user might expect. If the user writes std::cout << std::format(...),
> then that is an explicit opt in to the behavior that
> std::format() exhibits. But they might also want to just write UTF-8 bytes
> unmodified regardless of what the ordinary literal encoding is. Or they might
> expect implicit transcoding to either the current locale or the environment
> locale or even the terminal locale. By not providing a default behavior, we give
> the programmer the opportunity to think about what they are actually trying
> to do.
I'm not sure I buy all that. Once format() returns, we are left with a string
in the literal encoding. That string goes to std::cout. There's not much
difference between sending a string in the literal encoding to std::cout,
and sending a string in UTF-8 to std::cout, especially when the literal encoding
is UTF-8, but also in principle.
Namely,
> iostreams implicitly consults either an imbued locale facet or the global locale
> for formatting operations.
this remains true for either of our string encodings. There's absolutely no
guarantee that the imbued locale facet is more suitable for outputting the
literal encoding than it's for outputting UTF-8. In fact it may very well be less
suitable.
> In the latter case, we have to assume that some_std_string holds text in the
> encoding expected on the other end of the stream.
Again, I don't see why that would be true. If you are going to invoke CP437
in the UTF-8 case, I don't see why we suddenly need to ignore its existence
in the literal encoding case.
There's nothing stopping us from making std::cout << u8"..." _at least as
good as_ std::cout << std::format( "{}", u8"..." ) - we just make it transcode
to the literal encoding. Yes, it's potentially possible to do better than that,
but it needn't be any worse, and in the common case of the literal
encoding being UTF-8, both will be as good as can be achieved.
Now, had the proposal on the table been std::print( u8"{}", u8"..." )... that's
another story altogether. But we aren't talking about that.
Received on 2024-04-29 23:08:39