On 4/27/21 3:52 PM, Corentin Jabot wrote:


On Tue, Apr 27, 2021 at 8:12 PM Tom Honermann <tom@honermann.net> wrote:
On 4/27/21 1:56 PM, Corentin Jabot wrote:


On Tue, Apr 27, 2021 at 7:51 PM Tom Honermann <tom@honermann.net> wrote:
On 4/27/21 1:43 PM, Corentin Jabot wrote:


On Tue, Apr 27, 2021 at 7:20 PM Tom Honermann <tom@honermann.net> wrote:
On 4/27/21 12:27 PM, Corentin Jabot wrote:

I think we've been focusing on different things here.  The issue I'm trying to discuss is independent of use of the write-directly-to-the-console method.  This discussion is about having std::print() (and std::format()) internally ensure that that format arguments provided by the locale are transcoded to match the encoding of the format string.  This happens before anything is written to the console; this is the step where the formatting is done and the intent is to ensure that well-formed text is produced *before* it is transcoded to the native console encoding (whether that be UTF-8, UTF-16, whatever).  Transcoding requires well-formed input of course.

Does this help to get us on the same page


I actually disagree with that.
I don't think there is intent in the current design that the output has to be text at all. I could use format to create some kind of binary format if i wanted to, except the _formatting_ string is text because it needs to be parsed,
So format as specified doesn't put requirements on  the arguments beyond the formatting string and doesn't need to.
What makes print text is that it outputs to the console, at which point text is assumed.
The transcoding  happens after formating, and might as well not

forrmat(a, b, c) -> result
printUtf8ToConsole(result);

The fact that printUtf8 is implemented as printUTF16(toUTF16(result)) is an implementation detail that should not be observable nor described by the C++ standard.

And I don't think print should do _anything_ to check for some amount of validity before  printing out something.

I don't disagree with what you wrote above, but it is not relevant to this discussion.  I don't know why we're having such a hard time communicating here.  Please, carefully re-read some of my prior responses with the understanding that how you have understood them so far does not match what I intended.  If you then have clarifying questions, please feel free to ask them.


Okay, so your point is that implementations should do something magical for things that are formatted through a locale facet on the basis the encoding of the result of time_put is known?

Yes, with two minor caveats.

  1. I don't see this as magical since the source and target encodings are known.
  2. I'm only suggesting this as a design option for us to consider.  I'm not claiming that I think this is the best approach to the problem (I'm undecided as to what solution I favor so far).

Another question: do you think format should have the same behavior?

I want the answer to be yes, that they should behave consistently, but I acknowledge this is more complicated.  For example, a programmer may intend to format text in the locale encoding regardless of whether the literal encoding is UTF-8 or not.  In that scenario, there is an implication that the format string be limited to characters that are valid for the locale encoding.  On the other hand, the programmer may intend to produce UTF-8 text and be quite surprised when std::format() inserts codepage 932 text in their output (regardless of whether their format string contains explicitly locale dependent field specifiers).

This ambiguity is why I continue to have reservations about basing behavior (other than the encoded values of literals) on the compile-time literal encoding.


First of, I'm sorry for the miscommunication issue.
I think I understand you better now.
No problem, we worked through it.  I think we may be lacking some terminology that would help to be more specific.  And there are a fair number of moving parts involved.

I think this is a good motivation to make the default behavior local independant. I am really concerned. of the timeline here....
Yes, me too.
Then I think we should be vigilant not to try to shoehorn locales "fixes" onto std::print.
Agreed.
But converting locale things _to_ utf-8 seems...okay.
It certainly doesn't make things worse!

Yeah, it might still be surprising and unwanted in some cases though.  Perhaps:

  1. Make the chrono format specifiers locale independent (e.g., always "C" locale).
  2. Do not provide a 'L' specifier for locale dependent chrono format specifiers.
  3. Provide a mechanism for locales to distinguish translation and encoding (arguably this exists with the current std::locale facets, but...)
  4. Introduce a specifier once an interface is available for std::format() to request a localized translation in a particular encoding.  This may require the ability to separately specify the encoding.  For example, "{:%rL}" for locale encoding, and "{:%rLu8}" for locale translation in UTF-8.  Maybe we can default the encoding in a smarter way.

An important point to keep in mind is: how do we evolve that thing :)

Definitely!

Tom.


But again, 

Tom.

Tom.