C++ Logo

sg16

Advanced search

[SG16] Additional concerns for LWG3565: Handling of encodings in localized formatting of chrono types is underspecified

From: Charlie Barto <Charles.Barto_at_[hidden]>
Date: Mon, 2 Aug 2021 19:50:57 +0000
I was discussing this with a coworker (Billy) and he brought up the point that even if we have an allow-list of locales in which to do the transcoding the proposed resolution is _still_ quite difficult to implement, because users can call “combine” or use “locale::locale(locale&,facet*)” (or various other constructors) to shove arbitrary new facets into a locale. The returned locale will have a different name, or be unnamed, but we still need to handle the facets from the original locale the same as before (i.e. do the transcoding). Otherwise users will be chugging along happily transcoding locale specific text, decide they’d like to add a new facet, and suddenly get mojibake.

To support this in a way that doesn’t have this surprise we’d have to take the facet we’d like to use and compare it against every single “supported” version of that facet. This means if we allow-list locale names we would have to allocate _every single locale’s_ version of a given facet the first time a locale-sensitive chrono specifier was used. The comparison may involve a dynamic_cast (although I’m not 100% sure that’s really necessary), making it potentially quite expensive. For me this is a bit of a dealbreaker if LWG3565 is applied without P2372.

It may be better to just allow implementations to use some custom facet type (maybe time_get<uchar_t>, or maybe _Utf8_time_get<char>, etc) when doing chrono formatting. This may already be allowed by the wording in [time.format]. It does not appear to actually say we must use a particular locale, although that may be specified in ISO8601:2004, a copy of which I am trying to get a hold of. We could also say that conversion goes through a hypothetical std::codecvt<char, char8_t, std::mbstate_t> facet.

In general, I’m OK with the “{:L}” specifiers being a little broken, there’s only so much we can do since facets don’t have a way of communicating the encoding of their output (besides the codecvt facets). If/when we add some better locale handling support, we can always add an “{:LEx}” specifier (maybe we could use “{:ℒ}” or “{:𝔏}” 😊).

Charlie

Received on 2021-08-02 14:51:01