On Fri, Jun 18, 2021 at 10:22 AM Peter Brett <pbrett@cadence.com> wrote:

Hi Corentin,

 

The requirement to perform transcoding makes me uncomfortable because I don’t think it’s actually implementable in the general case.

 

Users of the standard library can create customized locale objects with bespoke time_put facets, and there is literally no way for the chrono formatter to know which codeset a user-specified locale facet is using or how to transcode its output.

 

Totally happy for you to shoot down my alternative proposal, but I’m opposed to the current proposed resolution because std::locale just doesn’t work like that.


The locale objects themselves do have an encoding (with the assumption that facets will respect that encoding)
The answer here is P1885  - which makes that information publicly accessible. In absence of that, implementers have the information.
Well, some of them do (glibc, microsoft), but indeed on some platforms the information does not exist because nl_langinfo is not part of the posix spec, so P1885 will give you unknown information.
Is that an issue?

My understanding is that the set of scenario in which
  • There exists both a XXX an XXX.UTF-8 locale and the implementation knows how to go from one to the other
  • The implementation doesn't know the encoding of XXX
is empty or very small.

I think you are right that we probably don't say how custom facets behave in respect to encodings but we certainly expect them to behave a certain way!
 

 

Best regards,

 

                     Peter

 

From: Corentin Jabot <corentinjabot@gmail.com>
Sent: 18 June 2021 09:14
To: SG16 <sg16@lists.isocpp.org>
Cc: Peter Brett <pbrett@cadence.com>
Subject: Re: [SG16] Alternative approach for LWG3565 "Handling of encodings in localized chrono formatting"

 

On Thu, Jun 17, 2021 at 10:57 PM Peter Brett via SG16 <sg16@lists.isocpp.org> wrote:

Hi all,

The current proposed resolution for LWG3565 (https://wg21.link/LWG3565)
involves transcoding from the locale encoding to UTF-8.  This makes me a
little uncomfortable.

 

Can you clarify what makes you uncomfortable?

 


Is it possible instead to say that, if the string literal encoding is
UTF-8, then the effective locale is _as if_ the specified or global
locale was modified by replacing the associated codeset with UTF-8?

So, the following code:

    std::locale l1("Russian.1251");
    auto s = std::format(l1, "День недели: {:L}", std::chrono::Monday);

Would behave as if replaced by:

    std::locale l1("Russian.1251");
    std::locale l2(l1, std::locale("Russian.UTF-8"), locale::time);
    auto s = std::format(l2, "День недели: {:L}", std::chrono::Monday);

This would permit an implementation that has UTF-8 locale data available
to use it directly, rather than being required to use the 1251 codeset
locale data and transcode in order to conform to the standard.

 

"associated codeset with UTF-8" is not really a thing.

The ".UTF-8" locales merely exist by convention on some platforms 

 

There is no spec that says that

 

* Russian.1251 is not UTF-8

* Russian.1251.UTF-8 exists

* Russian.1251 and Russian.1251.UTF-8 only differ by encoding if both exist

 

Transcoding is therefore more generally applicable.

 

Note that I have my own reservations about this issue, namely how much effort are we willing to put

into mending a system that only works for a narrow subset of cultures, languages and circumstances?

That being said, even if that issue amounts to putting duct tape over a giant crack in the wall,

It also doesn't hurt.

It is undoubtedly more correct than the status quo and it might make the life of our windows users a bit less painful

as a stopgap solution

 


                             Peter

P.S. How would one go about writing a locale object that customizes
chrono formatting with std::format?  Does anyone have a code sample?
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16