C++ Logo

SG16

Advanced search

Subject: Re: Agenda for the 2021-04-28 SG16 telecon
From: Tom Honermann (tom_at_[hidden])
Date: 2021-04-27 10:15:59


On 4/27/21 10:52 AM, Victor Zverovich via SG16 wrote:
> Dear Unicoders,
>
> Thanks, Tom, for putting together a detailed list of options. Just
> want to add that print is a completely wrong abstraction level to try
> to address this (we should address it but it has little to do with
> P2093). The root cause is a mismatch between literal and locale
> encoding and it should be addressed on the formatting level in cases
> where a locale is used.

Yes, I agree, 100%.  I think the only reason the problem is seen as more
relevant for std::print() is because the proposal intends to process the
formatted output (e.g., transcode it from UTF-8 to native console
encoding) when the literal encoding is UTF-8 where as std::format() just
dumps the bytes and produces mojibake (thus making it someone else's
problem).

> Here's an example from another thread that illustrates this:
>
>   std::cout << std::format("时间 {:%r}\n",
> std::chrono::system_clock::now().time_since_epoch());
>
> I think this belongs to a separate small (but important) paper unless
> the resolution is so trivial that it can be a drive-by fix in P2093.
A separate paper works for me though, per above, I think std::print() is
arguably more impacted by the issue than std::format() is.
>
> One more option is to give a runtime error when trying to use (via 'L'
> or other means) a locale with the encoding incompatible with the
> literal encoding. I'd either go with that or do transcoding. Dropping
> UTF-8 handling is the least desirable option in my opinion and will
> basically render the feature useless for me as a user.

Agreed.

>
> I mostly agree with Corentin except that '%r' can be considered as an
> explicit locale opt-in similar to 'L'.

Would it not be useful to be able to format dates and times in a locale
independent manner though (and have that be the default)?

Tom.

>
> Cheers,
> Victor
>
> On Tue, Apr 27, 2021 at 4:11 AM Corentin Jabot via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
>
>
> On Tue, Apr 27, 2021 at 12:57 PM Jean-Marc Bourguet via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> I'm probably too much a Unix guy, but having
>
> prog
>
> and
>
> prog | more
>
> or
>
> prog > file; cat file
>
> displaying different things is not something that meets my
> expectations. The difference in buffering behaviour is already
> hard enough to explain. Piping to more is far too common for
> it behaving differently than direct output.
>
> Yours,
>
> Either way you are printing out the same content.
> Except in one case it is _rendered_ correctly and in the other it
> might not.
> This will also not affect linux, it is addressing a very
> windows-specific problem for which the encoding the console
> assumes by default is not the execution encoding.
> This is explained in more details in Victor's paper
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2093r5.html#unicode
>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>



SG16 list run by sg16-owner@lists.isocpp.org