Avoiding multiple localization mechanisms is desirable.

I think the problem we're having boils down to this: Do we want std::format() (and the proposed std::print()) to manipulate strings (NTBSs with ambiguous or polyglot encoding; e.g., mojibake) or text (well formed code unit sequences for a particular encoding).  The existing locale facilities do not support the latter because there are multiple possible encodings at play (the ordinary literal encoding or the locale encoding, neither of which necessarily matches the programmers intent; the programmer may be using UTF-8 encoded strings with a literal encoding of Windows-1252 running in a Windows-1251 locale).  The PR for the issue tries to split the difference by choosing the former if the literal encoding is not UTF-8 and the latter otherwise.  This inconsistency is concerning to some.

Speaking solely for myself, I'm leaning towards these utilities manipulating strings (not text) in all existing cases.  This puts the burden of producing valid text on the programmer (e.g., if the format string is UTF-8 and the locale provides Windows-1251, then it is up to the programmer to accept the mojibake possibility or do something explicit to prevent it).  This is consistent with how the existing locale facilities work and allows these utilities to function as drop in replacements for printf(); including support for formatting binary data.

A possible way forward would be to allow the programmer to express encoding intent by passing a P1885 encoding identifier so that formatting functions can produce text in the expected encoding.  This doesn't necessarily eliminate all encoding confusion however; should the format string be interpreted using the literal encoding or the explicitly provided encoding?  When the literal encoding is Windows-1252, how should something like std::format(std::text_encoding::UTF8, "téxt) be handled (note that the encoding of "é" is different in Windows-1252 vs UTF-8)?  In this case, it seems rather obvious that the implementation should use Windows-1252 to interpret the format string and then transcode it to UTF-8.  Note that such transcoding would have to be performed a fragment at a time since not all fragments necessarily originate in the same encoding.  This would, of course, impose overhead, but only on an opt-in basis.

Tom.

On 7/30/21 9:59 AM, Howard Hinnant wrote:
The intent here is that the implementor uses the same machinery as for http://eel.is/c++draft/locale.time.put.  I do not think we want to burden the std::lib with two independent localization mechanisms.

Howard

On Jul 30, 2021, at 8:46 AM, Jonathan Wakely via Lib <lib@lists.isocpp.org> wrote:


On Fri, 30 Jul 2021 at 13:45, Corentin via Lib <lib@lists.isocpp.org> wrote:
We decided we want a paper to deal with the issue.
We definitely want to postpone!

OK, thanks.



On Fri, Jul 30, 2021 at 1:05 PM Jeff Garland <jeff@crystalclearsoftware.com> wrote:
Thanks Tom —

Are there wiki notes or anything?  We may want to defer discussion until you’ve had more time.

Jeff

On Jul 29, 2021, at 11:41 PM, Tom Honermann <tom@honermann.net> wrote:

Hi, Jeff. SG16 did discuss LWG 3565 this week. We haven’t reached a conclusion yet but the consensus appears to be heading in a direction that will lead to a different resolution than what is proposed in the issue. I’ll follow up more once I have the meeting summary and polls posted.

Tom.

On Jul 29, 2021, at 8:10 PM, Jeff Garland via Lib <lib@lists.isocpp.org> wrote:


Apologies for the late notice.  All new papers for this week:


P1072 basic_string::resize_and_overwrite
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1072r8.html


P2372R1 (LWG 3547) Fixing locale handling in chrono formatters ** c++20 bug fix **
https://wg21.link/P2372R1

related issues:
LWG 3547 Time formatters should not be locale sensitive by default
https://cplusplus.github.io/LWG/issue3547

LWG 3565 Handling of encodings in localized formatting of chrono types is underspecified
https://cplusplus.github.io/LWG/issue3565

P1636 Formatters for Library Types
https://wg21.link/p1636r2

——

The zoom details for this meeting (and all following LWG meetings) are:

Join from PC, Mac, Linux, iOS or Android: https://iso.zoom.us/j/99098440581?pwd=K01lM0VyVTB1NjRJN2lRbzFMTit3QT09
    Password: template

Or iPhone one-tap :
    US: +12532158782,,99098440581#  or +13017158592,,99098440581#
Or Telephone:
    Dial(for higher quality, dial a number based on your current location):
        US: +1 253 215 8782  or +1 301 715 8592  or +1 312 626 6799  or +1 346 248 7799  or +1 408 638 0968  or +1 646 876 9923  or +1 669 900 6833  or 877 853 5247 (Toll Free)
    Meeting ID: 990 9844 0581
    Password: 07955058
    International numbers available: https://iso.zoom.us/u/a4YcGUHwU

Or Skype for Business (Lync):
    https://iso.zoom.us/skype/99098440581
_______________________________________________
Lib mailing list
Lib@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib
Link to this post: http://lists.isocpp.org/lib/2021/07/19950.php
_______________________________________________
Lib mailing list
Lib@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib
Link to this post: http://lists.isocpp.org/lib/2021/07/19954.php
_______________________________________________
Lib mailing list
Lib@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib
Link to this post: http://lists.isocpp.org/lib/2021/07/19955.php