On Fri, Jul 30, 2021 at 5:38 PM Tom Honermann <tom@honermann.net> wrote:
Avoiding multiple localization mechanisms is desirable.

I think the problem we're having boils down to this: Do we want std::format() (and the proposed std::print()) to manipulate strings (NTBSs with ambiguous or polyglot encoding; e.g., mojibake) or text (well formed code unit sequences for a particular encoding).  The existing locale facilities do not support the latter because there are multiple possible encodings at play (the ordinary literal encoding or the locale encoding, neither of which necessarily matches the programmers intent; the programmer may be using UTF-8 encoded strings with a literal encoding of Windows-1252 running in a Windows-1251 locale).  The PR for the issue tries to split the difference by choosing the former if the literal encoding is not UTF-8 and the latter otherwise.  This inconsistency is concerning to some.

Speaking solely for myself, I'm leaning towards these utilities manipulating strings (not text) in all existing cases.  This puts the burden of producing valid text on the programmer (e.g., if the format string is UTF-8 and the locale provides Windows-1251, then it is up to the programmer to accept the mojibake possibility or do something explicit to prevent it).  This is consistent with how the existing locale facilities work and allows these utilities to function as drop in replacements for printf(); including support for formatting binary data.

A possible way forward would be to allow the programmer to express encoding intent by passing a P1885 encoding identifier so that formatting functions can produce text in the expected encoding.  This doesn't necessarily eliminate all encoding confusion however; should the format string be interpreted using the literal encoding or the explicitly provided encoding?  When the literal encoding is Windows-1252, how should something like std::format(std::text_encoding::UTF8, "téxt) be handled (note that the encoding of "é" is different in Windows-1252 vs UTF-8)?  In this case, it seems rather obvious that the implementation should use Windows-1252 to interpret the format string and then transcode it to UTF-8.  Note that such transcoding would have to be performed a fragment at a time since not all fragments necessarily originate in the same encoding.  This would, of course, impose overhead, but only on an opt-in basis.

I also think having a single localization facility would be best - and whatever fix we provide to this specific issue will not change that.

That being said, by asking "The russian name for Monday" you are definitely and unambiguously asking for text,
and it stands to reason that the burden to ensure that this text is delivered in an encoding that is compatible with the rest of your system should fall on the standard.

It would be incredibly hostile if our long term solution is  to force user to write code along the lines of

std::locale russian("ru-RU");
std::format("День недели: {}", transcode(utf8, russian.encoding(), format(russian, "{:L}", std::chrono::Monday)));

The current locale facilities conflate encoding an localization which is one (but not the sole) short coming they have
wg21.link/P2020 goes into more details



On 7/30/21 9:59 AM, Howard Hinnant wrote:
The intent here is that the implementor uses the same machinery as for http://eel.is/c++draft/locale.time.put.  I do not think we want to burden the std::lib with two independent localization mechanisms.


On Jul 30, 2021, at 8:46 AM, Jonathan Wakely via Lib <lib@lists.isocpp.org> wrote:
On Fri, 30 Jul 2021 at 13:45, Corentin via Lib <lib@lists.isocpp.org> wrote:
We decided we want a paper to deal with the issue.
We definitely want to postpone!

OK, thanks.

On Fri, Jul 30, 2021 at 1:05 PM Jeff Garland <jeff@crystalclearsoftware.com> wrote:
Thanks Tom —

Are there wiki notes or anything?  We may want to defer discussion until you’ve had more time.


On Jul 29, 2021, at 11:41 PM, Tom Honermann <tom@honermann.net> wrote:

Hi, Jeff. SG16 did discuss LWG 3565 this week. We haven’t reached a conclusion yet but the consensus appears to be heading in a direction that will lead to a different resolution than what is proposed in the issue. I’ll follow up more once I have the meeting summary and polls posted.


On Jul 29, 2021, at 8:10 PM, Jeff Garland via Lib <lib@lists.isocpp.org> wrote:

Apologies for the late notice.  All new papers for this week:

P1072 basic_string::resize_and_overwrite

P2372R1 (LWG 3547) Fixing locale handling in chrono formatters ** c++20 bug fix **

related issues:
LWG 3547 Time formatters should not be locale sensitive by default

LWG 3565 Handling of encodings in localized formatting of chrono types is underspecified

P1636 Formatters for Library Types


The zoom details for this meeting (and all following LWG meetings) are:

Join from PC, Mac, Linux, iOS or Android: https://iso.zoom.us/j/99098440581?pwd=K01lM0VyVTB1NjRJN2lRbzFMTit3QT09
    Password: template

Or iPhone one-tap :
    US: +12532158782,,99098440581#  or +13017158592,,99098440581#
Or Telephone:
    Dial(for higher quality, dial a number based on your current location):
        US: +1 253 215 8782  or +1 301 715 8592  or +1 312 626 6799  or +1 346 248 7799  or +1 408 638 0968  or +1 646 876 9923  or +1 669 900 6833  or 877 853 5247 (Toll Free)
    Meeting ID: 990 9844 0581
    Password: 07955058
    International numbers available: https://iso.zoom.us/u/a4YcGUHwU

Or Skype for Business (Lync):
Lib mailing list
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib
Link to this post: http://lists.isocpp.org/lib/2021/07/19950.php
Lib mailing list
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib
Link to this post: http://lists.isocpp.org/lib/2021/07/19954.php
Lib mailing list
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib
Link to this post: http://lists.isocpp.org/lib/2021/07/19955.php