The current resolution is also … questionable in the case where the string literal encoding isn’t UTF-8, although there’s nothing we could do in that case that’s always right (we could transcode into gb18030, since that’s a UTF, but not other pages).
I think the current behavior of C library functions in the case where the locale is non-unicode and they get something they can’t represent is to transliterate, which is a very bad default (and also not something I want to have to implement in the standard library, at least not unless it’s part of some larger Unicode support facility).
I tentatively support resolving this issue as “never transcode”, and if you specify a locale that has a different encoding to what’s in your format string you just get mojibake.
If users only use format control characters and other invariant characters in their format-string then the resulting encoding will be whatever the encoding of their parameters are. I expect users to rely on this when using plain std::format, and it seems odd to break that with chrono formatting.
There may be some implementations where the above isn’t true, because the format control characters are not invariant across all supported code-pages, but MSVC isn’t such an implementation. (as I’ve previously said the reason we care about the literal encoding in MSVC’s implementation is because while the control characters are invariant some of the encodings we support are _not_ self-synchronizing, so the control characters can show up as trailing bytes of multi-byte “shift” sequences).
All this to say that just because the literal string encoding is UTF-8 _does not mean_ the encoding of the output of std::format is expected to _also_ be UTF-8.
On Fri, Jun 18, 2021 at 10:22 AM Peter Brett <firstname.lastname@example.org> wrote:
The requirement to perform transcoding makes me uncomfortable because I don’t think it’s actually implementable in the general case.
Users of the standard library can create customized locale objects with bespoke time_put facets, and there is literally no way for the chrono formatter to know which codeset a user-specified locale facet is using or how to transcode its output.
Totally happy for you to shoot down my alternative proposal, but I’m opposed to the current proposed resolution because std::locale just doesn’t work like that.
The locale objects themselves do have an encoding (with the assumption that facets will respect that encoding)
The answer here is P1885 - which makes that information publicly accessible. In absence of that, implementers have the information.
Well, some of them do (glibc, microsoft), but indeed on some platforms the information does not exist because nl_langinfo is not part of the posix spec, so P1885 will give you unknown information.
Is that an issue?
My understanding is that the set of scenario in which
is empty or very small.
I think you are right that we probably don't say how custom facets behave in respect to encodings but we certainly expect them to behave a certain way!