C++ Logo

sg16

Advanced search

Re: [SG16] Agenda for the 2021-04-28 SG16 telecon

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Tue, 27 Apr 2021 21:52:04 +0200
On Tue, Apr 27, 2021 at 8:12 PM Tom Honermann <tom_at_[hidden]> wrote:

> On 4/27/21 1:56 PM, Corentin Jabot wrote:
>
>
>
> On Tue, Apr 27, 2021 at 7:51 PM Tom Honermann <tom_at_[hidden]> wrote:
>
>> On 4/27/21 1:43 PM, Corentin Jabot wrote:
>>
>>
>>
>> On Tue, Apr 27, 2021 at 7:20 PM Tom Honermann <tom_at_[hidden]> wrote:
>>
>>> On 4/27/21 12:27 PM, Corentin Jabot wrote:
>>>
>>>
>>> I think we've been focusing on different things here. The issue I'm
>>>>> trying to discuss is independent of use of the
>>>>> write-directly-to-the-console method. This discussion is about having
>>>>> std::print() (and std::format()) internally ensure that that format
>>>>> arguments provided by the locale are transcoded to match the encoding of
>>>>> the format string. This happens before anything is written to the console;
>>>>> this is the step where the formatting is done and the intent is to ensure
>>>>> that well-formed text is produced *before* it is transcoded to the native
>>>>> console encoding (whether that be UTF-8, UTF-16, whatever). Transcoding
>>>>> requires well-formed input of course.
>>>>>
>>>>> Does this help to get us on the same page
>>>>>
>>>>
>>> I actually disagree with that.
>>> I don't think there is intent in the current design that the output has
>>> to be text at all. I could use format to create some kind of binary format
>>> if i wanted to, except the _formatting_ string is text because it needs to
>>> be parsed,
>>> So format as specified doesn't put requirements on the arguments
>>> beyond the formatting string and doesn't need to.
>>> What makes print text is that it outputs to the console, at which point
>>> text is assumed.
>>> The transcoding happens after formating, and might as well not
>>>
>>> forrmat(a, b, c) -> result
>>> printUtf8ToConsole(result);
>>>
>>> The fact that printUtf8 is implemented as printUTF16(toUTF16(result)) is
>>> an implementation detail that should not be observable nor described by the
>>> C++ standard.
>>>
>>> And I don't think print should do _anything_ to check for some amount of
>>> validity before printing out something.
>>>
>>> I don't disagree with what you wrote above, but it is not relevant to
>>> this discussion. I don't know why we're having such a hard time
>>> communicating here. Please, carefully re-read some of my prior responses
>>> with the understanding that how you have understood them so far does not
>>> match what I intended. If you then have clarifying questions, please feel
>>> free to ask them.
>>>
>>
>> Okay, so your point is that implementations should do something magical
>> for things that are formatted through a locale facet on the basis
>> the encoding of the result of time_put is known?
>>
>> Yes, with two minor caveats.
>>
>> 1. I don't see this as magical since the source and target encodings
>> are known.
>> 2. I'm only suggesting this as a design option for us to consider.
>> I'm not claiming that I think this is the best approach to the problem (I'm
>> undecided as to what solution I favor so far).
>>
>>
> Another question: do you think format should have the same behavior?
>
> I want the answer to be yes, that they should behave consistently, but I
> acknowledge this is more complicated. For example, a programmer may intend
> to format text in the locale encoding regardless of whether the literal
> encoding is UTF-8 or not. In that scenario, there is an implication that
> the format string be limited to characters that are valid for the locale
> encoding. On the other hand, the programmer may intend to produce UTF-8
> text and be quite surprised when std::format() inserts codepage 932 text
> in their output (regardless of whether their format string contains
> explicitly locale dependent field specifiers).
>
> This ambiguity is why I continue to have reservations about basing
> behavior (other than the encoded values of literals) on the compile-time
> literal encoding.
>

First of, I'm sorry for the miscommunication issue.
I think I understand you better now.

I think this is a good motivation to make the default behavior local
independant. I am really concerned. of the timeline here....
Then I think we should be vigilant not to try to shoehorn locales "fixes"
onto std::print.
But converting locale things _to_ utf-8 seems...okay.
It certainly doesn't make things worse!

An important point to keep in mind is: how do we evolve that thing :)

But again,

Tom.
>
> Tom.
>>
>>
>>
>

Received on 2021-04-27 14:52:17