On Thu, 11 Jan 2024 at 10:08, Corentin Jabot <corentinjabot@gmail.com> wrote:


On Thu, Jan 11, 2024 at 12:25 AM Jonathan Wakely via SG16 <sg16@lists.isocpp.org> wrote:
What's the intended implementation strategy for https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2419r2.html on POSIX?

My best guess is something like this, where loc is the formatting locale ([time.format] p2):

if (narrow literal encoding is UTF-8)
  if (locale_t cloc = ::newlocale(loc.name()))
    if (const char* enc = ::nl_langinfo_l(CODESET, cloc))
      if (/*enc is not UTF-8 */) {
        iconv_t ic = ::iconv_open("UTF-8", enc);
        if (ic != (iconv_t)-1) {
          // use ::iconv to convert from locale's encoding to UTF-8

But that seems pretty involved ... and {fmt} doesn't do any of that.

I tried testing the example from the paper on Linux, and fmt::format fails with an exception. Debugging it shows that it tries to use the formatting locale's std::codecvt<char32_t, char, mbstate_t> facet to convert the string. But that's not right, because that codecvt specialization is defined by the standard to convert between UTF-8 and UTF-32 only. So it can only work if the input is ASCII, or the locale uses UTF-8, in which case there's nothing that needs converting anyway.

AFAIK the standard doesn't provide a way to convert from an arbitrary locale's encoding to the execution charset, or even to get the name of an arbitrary locale's encoding (C++23 provides a way to get the name of the execution environment's encoding, but not an arbitrary std::locale's encoding).

(the implementation of which is isomorphic to your pseudo code)

Oh! Somehow I missed that in P1885, thank you. That at least gives a portable API for the locale's encoding (and the nl_langinfo_l stuff will be hidden inside the implementation).

 
 

Is the pseudocode above the intention? Or am I misinterpreting something in P2419?

My recollection is that it was.
The minutes do seems to mention codecvt though

Oh yes. I read through the much longer minutes from the previous meeting on July 28th:
https://github.com/sg16-unicode/sg16-meetings/blob/965a93cc62fb79bc3744a8b50a9ae2776116d5c3/README-2021.md#july-28th-2021

I'm still pretty sure that using codecvt<char32_t, char, mbstate_t> doesn't work.
 


Note that an implementation that has an empty set of locales it can transcode is still conforming, we wanted a best-effort there.

Yeah, but "do nothing" is a pretty bad best effort :-)
And if "do nothing" is the intent then it shouldn't be there at all.

 
But it's been a while, my recollection may be hazy.

 
I've read the SG16 minutes when P2419 was discussed, and I don't see an answer.


--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16