Date: Thu, 11 Jan 2024 14:56:28 -0800
One more thing: on Windows/MSVC {fmt} uses codecvt<wchar_t, char> which
unlike its charN_t counterpart should work with any narrow encoding. The
same could be used on those POSIX systems that have Unicode wchar_t but I
wouldn't recommend it. It would be better to directly use the underlying
transcoding facility whatever that is (probably iconv).
- Victor
On Thu, Jan 11, 2024 at 6:01 AM Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:
> Hi Jonathan,
>
> > Is the pseudocode above the intention?
>
> Yes, it looks about right to me. {fmt} does something simpler because it
> only relies on (limited) standard facilities. For this reason it implements
> a subset of functionality (UTF conversions) which might also be conforming
> considering "an implementation-defined set of locales" but obviously less
> useful.
>
> HTH,
> Victor
>
> On Wed, Jan 10, 2024 at 3:25 PM Jonathan Wakely <cxx_at_[hidden]> wrote:
>
>> What's the intended implementation strategy for
>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2419r2.html on
>> POSIX?
>>
>> My best guess is something like this, where loc is the formatting locale
>> ([time.format] p2):
>>
>> if (narrow literal encoding is UTF-8)
>> if (locale_t cloc = ::newlocale(loc.name()))
>> if (const char* enc = ::nl_langinfo_l(CODESET, cloc))
>> if (/*enc is not UTF-8 */) {
>> iconv_t ic = ::iconv_open("UTF-8", enc);
>> if (ic != (iconv_t)-1) {
>> // use ::iconv to convert from locale's encoding to UTF-8
>>
>> But that seems pretty involved ... and {fmt} doesn't do any of that.
>>
>> I tried testing the example from the paper on Linux, and fmt::format
>> fails with an exception. Debugging it shows that it tries to use the
>> formatting locale's std::codecvt<char32_t, char, mbstate_t> facet to
>> convert the string. But that's not right, because that codecvt
>> specialization is defined by the standard to convert between UTF-8 and
>> UTF-32 only. So it can only work if the input is ASCII, or the locale uses
>> UTF-8, in which case there's nothing that needs converting anyway.
>>
>> AFAIK the standard doesn't provide a way to convert from an arbitrary
>> locale's encoding to the execution charset, or even to get the name of an
>> arbitrary locale's encoding (C++23 provides a way to get the name of the
>> execution environment's encoding, but not an arbitrary std::locale's
>> encoding).
>>
>> Is the pseudocode above the intention? Or am I misinterpreting something
>> in P2419?
>>
>> I've read the SG16 minutes when P2419 was discussed, and I don't see an
>> answer.
>>
>>
>>
unlike its charN_t counterpart should work with any narrow encoding. The
same could be used on those POSIX systems that have Unicode wchar_t but I
wouldn't recommend it. It would be better to directly use the underlying
transcoding facility whatever that is (probably iconv).
- Victor
On Thu, Jan 11, 2024 at 6:01 AM Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:
> Hi Jonathan,
>
> > Is the pseudocode above the intention?
>
> Yes, it looks about right to me. {fmt} does something simpler because it
> only relies on (limited) standard facilities. For this reason it implements
> a subset of functionality (UTF conversions) which might also be conforming
> considering "an implementation-defined set of locales" but obviously less
> useful.
>
> HTH,
> Victor
>
> On Wed, Jan 10, 2024 at 3:25 PM Jonathan Wakely <cxx_at_[hidden]> wrote:
>
>> What's the intended implementation strategy for
>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2419r2.html on
>> POSIX?
>>
>> My best guess is something like this, where loc is the formatting locale
>> ([time.format] p2):
>>
>> if (narrow literal encoding is UTF-8)
>> if (locale_t cloc = ::newlocale(loc.name()))
>> if (const char* enc = ::nl_langinfo_l(CODESET, cloc))
>> if (/*enc is not UTF-8 */) {
>> iconv_t ic = ::iconv_open("UTF-8", enc);
>> if (ic != (iconv_t)-1) {
>> // use ::iconv to convert from locale's encoding to UTF-8
>>
>> But that seems pretty involved ... and {fmt} doesn't do any of that.
>>
>> I tried testing the example from the paper on Linux, and fmt::format
>> fails with an exception. Debugging it shows that it tries to use the
>> formatting locale's std::codecvt<char32_t, char, mbstate_t> facet to
>> convert the string. But that's not right, because that codecvt
>> specialization is defined by the standard to convert between UTF-8 and
>> UTF-32 only. So it can only work if the input is ASCII, or the locale uses
>> UTF-8, in which case there's nothing that needs converting anyway.
>>
>> AFAIK the standard doesn't provide a way to convert from an arbitrary
>> locale's encoding to the execution charset, or even to get the name of an
>> arbitrary locale's encoding (C++23 provides a way to get the name of the
>> execution environment's encoding, but not an arbitrary std::locale's
>> encoding).
>>
>> Is the pseudocode above the intention? Or am I misinterpreting something
>> in P2419?
>>
>> I've read the SG16 minutes when P2419 was discussed, and I don't see an
>> answer.
>>
>>
>>
Received on 2024-01-11 22:56:41