C++ Logo

sg16

Advanced search

Re: Clarify "Clarify handling of encodings in localized formatting of chrono types"

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Thu, 11 Jan 2024 11:45:10 +0000
On Thu, 11 Jan 2024 at 10:08, Corentin Jabot <corentinjabot_at_[hidden]>
wrote:

>
>
> On Thu, Jan 11, 2024 at 12:25 AM Jonathan Wakely via SG16 <
> sg16_at_[hidden]> wrote:
>
>> What's the intended implementation strategy for
>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2419r2.html on
>> POSIX?
>>
>> My best guess is something like this, where loc is the formatting locale
>> ([time.format] p2):
>>
>> if (narrow literal encoding is UTF-8)
>> if (locale_t cloc = ::newlocale(loc.name()))
>> if (const char* enc = ::nl_langinfo_l(CODESET, cloc))
>> if (/*enc is not UTF-8 */) {
>> iconv_t ic = ::iconv_open("UTF-8", enc);
>> if (ic != (iconv_t)-1) {
>> // use ::iconv to convert from locale's encoding to UTF-8
>>
>> But that seems pretty involved ... and {fmt} doesn't do any of that.
>>
>> I tried testing the example from the paper on Linux, and fmt::format
>> fails with an exception. Debugging it shows that it tries to use the
>> formatting locale's std::codecvt<char32_t, char, mbstate_t> facet to
>> convert the string. But that's not right, because that codecvt
>> specialization is defined by the standard to convert between UTF-8 and
>> UTF-32 only. So it can only work if the input is ASCII, or the locale uses
>> UTF-8, in which case there's nothing that needs converting anyway.
>>
>> AFAIK the standard doesn't provide a way to convert from an arbitrary
>> locale's encoding to the execution charset, or even to get the name of an
>> arbitrary locale's encoding (C++23 provides a way to get the name of the
>> execution environment's encoding, but not an arbitrary std::locale's
>> encoding).
>>
>
> There is https://eel.is/c++draft/locale#lib:locale,encoding
> (the implementation of which is isomorphic to your pseudo code)
>

Oh! Somehow I missed that in P1885, thank you. That at least gives a
portable API for the locale's encoding (and the nl_langinfo_l stuff will be
hidden inside the implementation).



>
>
>>
>> Is the pseudocode above the intention? Or am I misinterpreting something
>> in P2419?
>>
>
> My recollection is that it was.
> The minutes do seems to mention codecvt though
>
> https://github.com/sg16-unicode/sg16-meetings/blob/965a93cc62fb79bc3744a8b50a9ae2776116d5c3/README-2021.md#meeting-summary-8
>

Oh yes. I read through the much longer minutes from the previous meeting on
July 28th:
https://github.com/sg16-unicode/sg16-meetings/blob/965a93cc62fb79bc3744a8b50a9ae2776116d5c3/README-2021.md#july-28th-2021

I'm still pretty sure that using codecvt<char32_t, char, mbstate_t> doesn't
work.


>
>
> Note that an implementation that has an empty set of locales it can
> transcode is still conforming, we wanted a best-effort there.
>

Yeah, but "do nothing" is a pretty bad best effort :-)
And if "do nothing" is the intent then it shouldn't be there at all.



> But it's been a while, my recollection may be hazy.
>
>
>
>> I've read the SG16 minutes when P2419 was discussed, and I don't see an
>> answer.
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2024-01-11 11:46:26