Date: Fri, 12 Jan 2024 00:09:37 +0000
On Thu, 11 Jan 2024, 22:58 Victor Zverovich, <victor.zverovich_at_[hidden]>
wrote:
> One more thing: on Windows/MSVC {fmt} uses codecvt<wchar_t, char> which
> unlike its charN_t counterpart should work with any narrow encoding. The
> same could be used on those POSIX systems that have Unicode wchar_t but I
> wouldn't recommend it. It would be better to directly use the underlying
> transcoding facility whatever that is (probably iconv).
>
Ack, thanks!
> - Victor
>
> On Thu, Jan 11, 2024 at 6:01 AM Victor Zverovich <
> victor.zverovich_at_[hidden]> wrote:
>
>> Hi Jonathan,
>>
>> > Is the pseudocode above the intention?
>>
>> Yes, it looks about right to me. {fmt} does something simpler because it
>> only relies on (limited) standard facilities. For this reason it implements
>> a subset of functionality (UTF conversions) which might also be conforming
>> considering "an implementation-defined set of locales" but obviously less
>> useful.
>>
>> HTH,
>> Victor
>>
>> On Wed, Jan 10, 2024 at 3:25 PM Jonathan Wakely <cxx_at_[hidden]> wrote:
>>
>>> What's the intended implementation strategy for
>>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2419r2.html
>>> on POSIX?
>>>
>>> My best guess is something like this, where loc is the formatting locale
>>> ([time.format] p2):
>>>
>>> if (narrow literal encoding is UTF-8)
>>> if (locale_t cloc = ::newlocale(loc.name()))
>>> if (const char* enc = ::nl_langinfo_l(CODESET, cloc))
>>> if (/*enc is not UTF-8 */) {
>>> iconv_t ic = ::iconv_open("UTF-8", enc);
>>> if (ic != (iconv_t)-1) {
>>> // use ::iconv to convert from locale's encoding to UTF-8
>>>
>>> But that seems pretty involved ... and {fmt} doesn't do any of that.
>>>
>>> I tried testing the example from the paper on Linux, and fmt::format
>>> fails with an exception. Debugging it shows that it tries to use the
>>> formatting locale's std::codecvt<char32_t, char, mbstate_t> facet to
>>> convert the string. But that's not right, because that codecvt
>>> specialization is defined by the standard to convert between UTF-8 and
>>> UTF-32 only. So it can only work if the input is ASCII, or the locale uses
>>> UTF-8, in which case there's nothing that needs converting anyway.
>>>
>>> AFAIK the standard doesn't provide a way to convert from an arbitrary
>>> locale's encoding to the execution charset, or even to get the name of an
>>> arbitrary locale's encoding (C++23 provides a way to get the name of the
>>> execution environment's encoding, but not an arbitrary std::locale's
>>> encoding).
>>>
>>> Is the pseudocode above the intention? Or am I misinterpreting something
>>> in P2419?
>>>
>>> I've read the SG16 minutes when P2419 was discussed, and I don't see an
>>> answer.
>>>
>>>
>>>
wrote:
> One more thing: on Windows/MSVC {fmt} uses codecvt<wchar_t, char> which
> unlike its charN_t counterpart should work with any narrow encoding. The
> same could be used on those POSIX systems that have Unicode wchar_t but I
> wouldn't recommend it. It would be better to directly use the underlying
> transcoding facility whatever that is (probably iconv).
>
Ack, thanks!
> - Victor
>
> On Thu, Jan 11, 2024 at 6:01 AM Victor Zverovich <
> victor.zverovich_at_[hidden]> wrote:
>
>> Hi Jonathan,
>>
>> > Is the pseudocode above the intention?
>>
>> Yes, it looks about right to me. {fmt} does something simpler because it
>> only relies on (limited) standard facilities. For this reason it implements
>> a subset of functionality (UTF conversions) which might also be conforming
>> considering "an implementation-defined set of locales" but obviously less
>> useful.
>>
>> HTH,
>> Victor
>>
>> On Wed, Jan 10, 2024 at 3:25 PM Jonathan Wakely <cxx_at_[hidden]> wrote:
>>
>>> What's the intended implementation strategy for
>>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2419r2.html
>>> on POSIX?
>>>
>>> My best guess is something like this, where loc is the formatting locale
>>> ([time.format] p2):
>>>
>>> if (narrow literal encoding is UTF-8)
>>> if (locale_t cloc = ::newlocale(loc.name()))
>>> if (const char* enc = ::nl_langinfo_l(CODESET, cloc))
>>> if (/*enc is not UTF-8 */) {
>>> iconv_t ic = ::iconv_open("UTF-8", enc);
>>> if (ic != (iconv_t)-1) {
>>> // use ::iconv to convert from locale's encoding to UTF-8
>>>
>>> But that seems pretty involved ... and {fmt} doesn't do any of that.
>>>
>>> I tried testing the example from the paper on Linux, and fmt::format
>>> fails with an exception. Debugging it shows that it tries to use the
>>> formatting locale's std::codecvt<char32_t, char, mbstate_t> facet to
>>> convert the string. But that's not right, because that codecvt
>>> specialization is defined by the standard to convert between UTF-8 and
>>> UTF-32 only. So it can only work if the input is ASCII, or the locale uses
>>> UTF-8, in which case there's nothing that needs converting anyway.
>>>
>>> AFAIK the standard doesn't provide a way to convert from an arbitrary
>>> locale's encoding to the execution charset, or even to get the name of an
>>> arbitrary locale's encoding (C++23 provides a way to get the name of the
>>> execution environment's encoding, but not an arbitrary std::locale's
>>> encoding).
>>>
>>> Is the pseudocode above the intention? Or am I misinterpreting something
>>> in P2419?
>>>
>>> I've read the SG16 minutes when P2419 was discussed, and I don't see an
>>> answer.
>>>
>>>
>>>
Received on 2024-01-12 00:10:56
