C++ Logo

sg16

Advanced search

Re: [SG16] Additional concerns for LWG3565: Handling of encodings in localized formatting of chrono types is underspecified

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Mon, 2 Aug 2021 23:21:14 +0200
There isn't mention in the wording that time_put is used.
Ex:

> %a The locale's abbreviated weekday name. If the value does not contain a
valid weekday, an exception of type format_error is thrown.

Can we say something roughly like:

If the format string encoding is UTF-8, for any locale in the
implementation-defined set of known locales, the value of %a %A %b, %B %c
%C %p %x %X
is UTF-8 encoded. [Note: Whether time_put is called is unspecified].
If the format string encoding is not UTF-8, or if the locale is not in the
implementation-defined set of known locales, the value of ... is
locale-specific
[Note: locale-specific value may not be in the same text encoding that the
format string]

With some bonus wording if we want to support other utf encodings?


On Mon, Aug 2, 2021 at 9:51 PM Charlie Barto via SG16 <sg16_at_[hidden]>
wrote:

> I was discussing this with a coworker (Billy) and he brought up the point
> that even if we have an allow-list of locales in which to do the
> transcoding the proposed resolution is _still_ quite difficult to
> implement, because users can call “combine” or use
> “locale::locale(locale&,facet*)” (or various other constructors) to shove
> arbitrary new facets into a locale. The returned locale will have a
> different name, or be unnamed, but we still need to handle the facets from
> the original locale the same as before (i.e. do the transcoding). Otherwise
> users will be chugging along happily transcoding locale specific text,
> decide they’d like to add a new facet, and suddenly get mojibake.
>
> To support this in a way that doesn’t have this surprise we’d have to take
> the facet we’d like to use and compare it against every single “supported”
> version of that facet. This means if we allow-list locale names we would
> have to allocate _every single locale’s_ version of a given facet the first
> time a locale-sensitive chrono specifier was used. The comparison may
> involve a dynamic_cast (although I’m not 100% sure that’s really
> necessary), making it potentially quite expensive. For me this is a bit of
> a dealbreaker if LWG3565 is applied without P2372.
>
> It may be better to just allow implementations to use some custom facet
> type (maybe time_get<uchar_t>, or maybe _Utf8_time_get<char>, etc) when
> doing chrono formatting. This may already be allowed by the wording in
> [time.format]. It does not appear to actually say we must use a particular
> locale, although that may be specified in ISO8601:2004, a copy of which I
> am trying to get a hold of. We could also say that conversion goes through
> a hypothetical std::codecvt<char, char8_t, std::mbstate_t> facet.
>
> In general, I’m OK with the “{:L}” specifiers being a little broken,
> there’s only so much we can do since facets don’t have a way of
> communicating the encoding of their output (besides the codecvt facets).
> If/when we add some better locale handling support, we can always add an
> “{:LEx}” specifier (maybe we could use “{:ℒ}” or “{:𝔏}” 😊).
>

+1 on that last point


>
> Charlie
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-08-02 16:21:28