Date: Thu, 11 Jan 2024 06:01:56 -0800
Hi Jonathan,
> Is the pseudocode above the intention?
Yes, it looks about right to me. {fmt} does something simpler because it
only relies on (limited) standard facilities. For this reason it implements
a subset of functionality (UTF conversions) which might also be conforming
considering "an implementation-defined set of locales" but obviously less
useful.
HTH,
Victor
On Wed, Jan 10, 2024 at 3:25 PM Jonathan Wakely <cxx_at_[hidden]> wrote:
> What's the intended implementation strategy for
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2419r2.html on
> POSIX?
>
> My best guess is something like this, where loc is the formatting locale
> ([time.format] p2):
>
> if (narrow literal encoding is UTF-8)
> if (locale_t cloc = ::newlocale(loc.name()))
> if (const char* enc = ::nl_langinfo_l(CODESET, cloc))
> if (/*enc is not UTF-8 */) {
> iconv_t ic = ::iconv_open("UTF-8", enc);
> if (ic != (iconv_t)-1) {
> // use ::iconv to convert from locale's encoding to UTF-8
>
> But that seems pretty involved ... and {fmt} doesn't do any of that.
>
> I tried testing the example from the paper on Linux, and fmt::format fails
> with an exception. Debugging it shows that it tries to use the formatting
> locale's std::codecvt<char32_t, char, mbstate_t> facet to convert the
> string. But that's not right, because that codecvt specialization is
> defined by the standard to convert between UTF-8 and UTF-32 only. So it can
> only work if the input is ASCII, or the locale uses UTF-8, in which case
> there's nothing that needs converting anyway.
>
> AFAIK the standard doesn't provide a way to convert from an arbitrary
> locale's encoding to the execution charset, or even to get the name of an
> arbitrary locale's encoding (C++23 provides a way to get the name of the
> execution environment's encoding, but not an arbitrary std::locale's
> encoding).
>
> Is the pseudocode above the intention? Or am I misinterpreting something
> in P2419?
>
> I've read the SG16 minutes when P2419 was discussed, and I don't see an
> answer.
>
>
>
> Is the pseudocode above the intention?
Yes, it looks about right to me. {fmt} does something simpler because it
only relies on (limited) standard facilities. For this reason it implements
a subset of functionality (UTF conversions) which might also be conforming
considering "an implementation-defined set of locales" but obviously less
useful.
HTH,
Victor
On Wed, Jan 10, 2024 at 3:25 PM Jonathan Wakely <cxx_at_[hidden]> wrote:
> What's the intended implementation strategy for
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2419r2.html on
> POSIX?
>
> My best guess is something like this, where loc is the formatting locale
> ([time.format] p2):
>
> if (narrow literal encoding is UTF-8)
> if (locale_t cloc = ::newlocale(loc.name()))
> if (const char* enc = ::nl_langinfo_l(CODESET, cloc))
> if (/*enc is not UTF-8 */) {
> iconv_t ic = ::iconv_open("UTF-8", enc);
> if (ic != (iconv_t)-1) {
> // use ::iconv to convert from locale's encoding to UTF-8
>
> But that seems pretty involved ... and {fmt} doesn't do any of that.
>
> I tried testing the example from the paper on Linux, and fmt::format fails
> with an exception. Debugging it shows that it tries to use the formatting
> locale's std::codecvt<char32_t, char, mbstate_t> facet to convert the
> string. But that's not right, because that codecvt specialization is
> defined by the standard to convert between UTF-8 and UTF-32 only. So it can
> only work if the input is ASCII, or the locale uses UTF-8, in which case
> there's nothing that needs converting anyway.
>
> AFAIK the standard doesn't provide a way to convert from an arbitrary
> locale's encoding to the execution charset, or even to get the name of an
> arbitrary locale's encoding (C++23 provides a way to get the name of the
> execution environment's encoding, but not an arbitrary std::locale's
> encoding).
>
> Is the pseudocode above the intention? Or am I misinterpreting something
> in P2419?
>
> I've read the SG16 minutes when P2419 was discussed, and I don't see an
> answer.
>
>
>
Received on 2024-01-11 14:02:09