C++ Logo

sg16

Advanced search

Re: [SG16] Alternative approach for LWG3565 "Handling of encodings in localized chrono formatting"

From: Peter Brett <pbrett_at_[hidden]>
Date: Fri, 18 Jun 2021 08:22:48 +0000
Hi Corentin,

The requirement to perform transcoding makes me uncomfortable because I don’t think it’s actually implementable in the general case.

Users of the standard library can create customized locale objects with bespoke time_put facets, and there is literally no way for the chrono formatter to know which codeset a user-specified locale facet is using or how to transcode its output.

Totally happy for you to shoot down my alternative proposal, but I’m opposed to the current proposed resolution because std::locale just doesn’t work like that.

Best regards,

                     Peter

From: Corentin Jabot <corentinjabot_at_[hidden]>
Sent: 18 June 2021 09:14
To: SG16 <sg16_at_[hidden]>
Cc: Peter Brett <pbrett_at_[hidden]>
Subject: Re: [SG16] Alternative approach for LWG3565 "Handling of encodings in localized chrono formatting"

On Thu, Jun 17, 2021 at 10:57 PM Peter Brett via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:
Hi all,

The current proposed resolution for LWG3565 (https://wg21.link/LWG3565<https://urldefense.com/v3/__https:/wg21.link/LWG3565__;!!EHscmS1ygiU1lA!UImbHs51DLVC5_4iWd5hIcpUw4nbv7r2fAr3NVLyMFGjevk3CAeqq8cYQwVAug$>)
involves transcoding from the locale encoding to UTF-8. This makes me a
little uncomfortable.

Can you clarify what makes you uncomfortable?


Is it possible instead to say that, if the string literal encoding is
UTF-8, then the effective locale is _as if_ the specified or global
locale was modified by replacing the associated codeset with UTF-8?

So, the following code:

    std::locale l1("Russian.1251");
    auto s = std::format(l1, "День недели: {:L}", std::chrono::Monday);

Would behave as if replaced by:

    std::locale l1("Russian.1251");
    std::locale l2(l1, std::locale("Russian.UTF-8"), locale::time);
    auto s = std::format(l2, "День недели: {:L}", std::chrono::Monday);

This would permit an implementation that has UTF-8 locale data available
to use it directly, rather than being required to use the 1251 codeset
locale data and transcode in order to conform to the standard.

"associated codeset with UTF-8" is not really a thing.
The ".UTF-8" locales merely exist by convention on some platforms

There is no spec that says that

* Russian.1251 is not UTF-8
* Russian.1251.UTF-8 exists
* Russian.1251 and Russian.1251.UTF-8 only differ by encoding if both exist

Transcoding is therefore more generally applicable.

Note that I have my own reservations about this issue, namely how much effort are we willing to put
into mending a system that only works for a narrow subset of cultures, languages and circumstances?
That being said, even if that issue amounts to putting duct tape over a giant crack in the wall,
It also doesn't hurt.
It is undoubtedly more correct than the status quo and it might make the life of our windows users a bit less painful
as a stopgap solution


                             Peter

P.S. How would one go about writing a locale object that customizes
chrono formatting with std::format? Does anyone have a code sample?
--
SG16 mailing list
SG16_at_lists.isocpp.org<mailto:SG16_at_[hidden]>
https://lists.isocpp.org/mailman/listinfo.cgi/sg16<https://urldefense.com/v3/__https:/lists.isocpp.org/mailman/listinfo.cgi/sg16__;!!EHscmS1ygiU1lA!UImbHs51DLVC5_4iWd5hIcpUw4nbv7r2fAr3NVLyMFGjevk3CAeqq8cDfqp-Dw$>

Received on 2021-06-18 03:22:56