C++ Logo


Advanced search

Re: [SG16] Agenda for the 2021-04-28 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 27 Apr 2021 14:12:31 -0400
On 4/27/21 1:56 PM, Corentin Jabot wrote:
> On Tue, Apr 27, 2021 at 7:51 PM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
> On 4/27/21 1:43 PM, Corentin Jabot wrote:
>> On Tue, Apr 27, 2021 at 7:20 PM Tom Honermann <tom_at_[hidden]
>> <mailto:tom_at_[hidden]>> wrote:
>> On 4/27/21 12:27 PM, Corentin Jabot wrote:
>>> I think we've been focusing on different things
>>> here. The issue I'm trying to discuss is
>>> independent of use of the
>>> write-directly-to-the-console method. This
>>> discussion is about having std::print() (and
>>> std::format()) internally ensure that that format
>>> arguments provided by the locale are transcoded to
>>> match the encoding of the format string. This
>>> happens before anything is written to the console;
>>> this is the step where the formatting is done and
>>> the intent is to ensure that well-formed text is
>>> produced *before* it is transcoded to the native
>>> console encoding (whether that be UTF-8, UTF-16,
>>> whatever). Transcoding requires well-formed input of
>>> course.
>>> Does this help to get us on the same page
>>> I actually disagree with that.
>>> I don't think there is intent in the current design that the
>>> output has to be text at all. I could use format to create
>>> some kind of binary format if i wanted to, except the
>>> _formatting_ string is text because it needs to be parsed,
>>> So format as specified doesn't put requirements on the
>>> arguments beyond the formatting string and doesn't need to.
>>> What makes print text is that it outputs to the console, at
>>> which point text is assumed.
>>> The transcoding happens after formating, and might as well not
>>> forrmat(a, b, c) -> result
>>> printUtf8ToConsole(result);
>>> The fact that printUtf8 is implemented as
>>> printUTF16(toUTF16(result)) is an implementation detail that
>>> should not be observable nor described by the C++ standard.
>>> And I don't think print should do _anything_ to check for
>>> some amount of validity before printing out something.
>> I don't disagree with what you wrote above, but it is not
>> relevant to this discussion. I don't know why we're having
>> such a hard time communicating here. Please, carefully
>> re-read some of my prior responses with the understanding
>> that how you have understood them so far does not match what
>> I intended. If you then have clarifying questions, please
>> feel free to ask them.
>> Okay, so your point is that implementations should do something
>> magical for things that are formatted through a locale facet on
>> the basis the encoding of the result of time_put is known?
> Yes, with two minor caveats.
> 1. I don't see this as magical since the source and target
> encodings are known.
> 2. I'm only suggesting this as a design option for us to
> consider. I'm not claiming that I think this is the best
> approach to the problem (I'm undecided as to what solution I
> favor so far).
> Another question: do you think format should have the same behavior?

I want the answer to be yes, that they should behave consistently, but I
acknowledge this is more complicated. For example, a programmer may
intend to format text in the locale encoding regardless of whether the
literal encoding is UTF-8 or not. In that scenario, there is an
implication that the format string be limited to characters that are
valid for the locale encoding. On the other hand, the programmer may
intend to produce UTF-8 text and be quite surprised when std::format()
inserts codepage 932 text in their output (regardless of whether their
format string contains explicitly locale dependent field specifiers).

This ambiguity is why I continue to have reservations about basing
behavior (other than the encoded values of literals) on the compile-time
literal encoding.


> Tom.

Received on 2021-04-27 13:12:47