Date: Tue, 27 Apr 2021 17:19:44 -0400
On 4/27/21 3:52 PM, Corentin Jabot wrote:
>
>
> On Tue, Apr 27, 2021 at 8:12 PM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
>
> On 4/27/21 1:56 PM, Corentin Jabot wrote:
>>
>>
>> On Tue, Apr 27, 2021 at 7:51 PM Tom Honermann <tom_at_[hidden]
>> <mailto:tom_at_[hidden]>> wrote:
>>
>> On 4/27/21 1:43 PM, Corentin Jabot wrote:
>>>
>>>
>>> On Tue, Apr 27, 2021 at 7:20 PM Tom Honermann
>>> <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>>>
>>> On 4/27/21 12:27 PM, Corentin Jabot wrote:
>>>>
>>>> I think we've been focusing on different things
>>>> here. The issue I'm trying to discuss is
>>>> independent of use of the
>>>> write-directly-to-the-console method. This
>>>> discussion is about having std::print() (and
>>>> std::format()) internally ensure that that
>>>> format arguments provided by the locale are
>>>> transcoded to match the encoding of the format
>>>> string. This happens before anything is
>>>> written to the console; this is the step where
>>>> the formatting is done and the intent is to
>>>> ensure that well-formed text is produced
>>>> *before* it is transcoded to the native console
>>>> encoding (whether that be UTF-8, UTF-16,
>>>> whatever). Transcoding requires well-formed
>>>> input of course.
>>>>
>>>> Does this help to get us on the same page
>>>>
>>>>
>>>> I actually disagree with that.
>>>> I don't think there is intent in the current design
>>>> that the output has to be text at all. I could use
>>>> format to create some kind of binary format if i wanted
>>>> to, except the _formatting_ string is text because it
>>>> needs to be parsed,
>>>> So format as specified doesn't put requirements on the
>>>> arguments beyond the formatting string and doesn't need to.
>>>> What makes print text is that it outputs to the
>>>> console, at which point text is assumed.
>>>> The transcoding happens after formating, and might as
>>>> well not
>>>>
>>>> forrmat(a, b, c) -> result
>>>> printUtf8ToConsole(result);
>>>>
>>>> The fact that printUtf8 is implemented as
>>>> printUTF16(toUTF16(result)) is an implementation detail
>>>> that should not be observable nor described by the C++
>>>> standard.
>>>>
>>>> And I don't think print should do _anything_ to check
>>>> for some amount of validity before printing out something.
>>>>
>>> I don't disagree with what you wrote above, but it is
>>> not relevant to this discussion. I don't know why we're
>>> having such a hard time communicating here. Please,
>>> carefully re-read some of my prior responses with the
>>> understanding that how you have understood them so far
>>> does not match what I intended. If you then have
>>> clarifying questions, please feel free to ask them.
>>>
>>>
>>> Okay, so your point is that implementations should do
>>> something magical for things that are formatted through a
>>> locale facet on the basis the encoding of the result
>>> of time_put is known?
>>
>> Yes, with two minor caveats.
>>
>> 1. I don't see this as magical since the source and target
>> encodings are known.
>> 2. I'm only suggesting this as a design option for us to
>> consider. I'm not claiming that I think this is the best
>> approach to the problem (I'm undecided as to what
>> solution I favor so far).
>>
>>
>> Another question: do you think format should have the same behavior?
>
> I want the answer to be yes, that they should behave consistently,
> but I acknowledge this is more complicated. For example, a
> programmer may intend to format text in the locale encoding
> regardless of whether the literal encoding is UTF-8 or not. In
> that scenario, there is an implication that the format string be
> limited to characters that are valid for the locale encoding. On
> the other hand, the programmer may intend to produce UTF-8 text
> and be quite surprised when std::format() inserts codepage 932
> text in their output (regardless of whether their format string
> contains explicitly locale dependent field specifiers).
>
> This ambiguity is why I continue to have reservations about basing
> behavior (other than the encoded values of literals) on the
> compile-time literal encoding.
>
>
> First of, I'm sorry for the miscommunication issue.
> I think I understand you better now.
No problem, we worked through it. I think we may be lacking some
terminology that would help to be more specific. And there are a fair
number of moving parts involved.
>
> I think this is a good motivation to make the default behavior local
> independant. I am really concerned. of the timeline here....
Yes, me too.
> Then I think we should be vigilant not to try to shoehorn locales
> "fixes" onto std::print.
Agreed.
> But converting locale things _to_ utf-8 seems...okay.
> It certainly doesn't make things worse!
Yeah, it might still be surprising and unwanted in some cases though.
Perhaps:
1. Make the chrono format specifiers locale independent (e.g., always
"C" locale).
2. Do not provide a 'L' specifier for locale dependent chrono format
specifiers.
3. Provide a mechanism for locales to distinguish translation and
encoding (arguably this exists with the current std::locale facets,
but...)
4. Introduce a specifier once an interface is available for
std::format() to request a localized translation in a particular
encoding. This may require the ability to separately specify the
encoding. For example, "{:%rL}" for locale encoding, and "{:%rLu8}"
for locale translation in UTF-8. Maybe we can default the encoding
in a smarter way.
>
> An important point to keep in mind is: how do we evolve that thing :)
Definitely!
Tom.
>
> But again,
>
> Tom.
>
>> Tom.
>>
>>
>
>
>
> On Tue, Apr 27, 2021 at 8:12 PM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
>
> On 4/27/21 1:56 PM, Corentin Jabot wrote:
>>
>>
>> On Tue, Apr 27, 2021 at 7:51 PM Tom Honermann <tom_at_[hidden]
>> <mailto:tom_at_[hidden]>> wrote:
>>
>> On 4/27/21 1:43 PM, Corentin Jabot wrote:
>>>
>>>
>>> On Tue, Apr 27, 2021 at 7:20 PM Tom Honermann
>>> <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>>>
>>> On 4/27/21 12:27 PM, Corentin Jabot wrote:
>>>>
>>>> I think we've been focusing on different things
>>>> here. The issue I'm trying to discuss is
>>>> independent of use of the
>>>> write-directly-to-the-console method. This
>>>> discussion is about having std::print() (and
>>>> std::format()) internally ensure that that
>>>> format arguments provided by the locale are
>>>> transcoded to match the encoding of the format
>>>> string. This happens before anything is
>>>> written to the console; this is the step where
>>>> the formatting is done and the intent is to
>>>> ensure that well-formed text is produced
>>>> *before* it is transcoded to the native console
>>>> encoding (whether that be UTF-8, UTF-16,
>>>> whatever). Transcoding requires well-formed
>>>> input of course.
>>>>
>>>> Does this help to get us on the same page
>>>>
>>>>
>>>> I actually disagree with that.
>>>> I don't think there is intent in the current design
>>>> that the output has to be text at all. I could use
>>>> format to create some kind of binary format if i wanted
>>>> to, except the _formatting_ string is text because it
>>>> needs to be parsed,
>>>> So format as specified doesn't put requirements on the
>>>> arguments beyond the formatting string and doesn't need to.
>>>> What makes print text is that it outputs to the
>>>> console, at which point text is assumed.
>>>> The transcoding happens after formating, and might as
>>>> well not
>>>>
>>>> forrmat(a, b, c) -> result
>>>> printUtf8ToConsole(result);
>>>>
>>>> The fact that printUtf8 is implemented as
>>>> printUTF16(toUTF16(result)) is an implementation detail
>>>> that should not be observable nor described by the C++
>>>> standard.
>>>>
>>>> And I don't think print should do _anything_ to check
>>>> for some amount of validity before printing out something.
>>>>
>>> I don't disagree with what you wrote above, but it is
>>> not relevant to this discussion. I don't know why we're
>>> having such a hard time communicating here. Please,
>>> carefully re-read some of my prior responses with the
>>> understanding that how you have understood them so far
>>> does not match what I intended. If you then have
>>> clarifying questions, please feel free to ask them.
>>>
>>>
>>> Okay, so your point is that implementations should do
>>> something magical for things that are formatted through a
>>> locale facet on the basis the encoding of the result
>>> of time_put is known?
>>
>> Yes, with two minor caveats.
>>
>> 1. I don't see this as magical since the source and target
>> encodings are known.
>> 2. I'm only suggesting this as a design option for us to
>> consider. I'm not claiming that I think this is the best
>> approach to the problem (I'm undecided as to what
>> solution I favor so far).
>>
>>
>> Another question: do you think format should have the same behavior?
>
> I want the answer to be yes, that they should behave consistently,
> but I acknowledge this is more complicated. For example, a
> programmer may intend to format text in the locale encoding
> regardless of whether the literal encoding is UTF-8 or not. In
> that scenario, there is an implication that the format string be
> limited to characters that are valid for the locale encoding. On
> the other hand, the programmer may intend to produce UTF-8 text
> and be quite surprised when std::format() inserts codepage 932
> text in their output (regardless of whether their format string
> contains explicitly locale dependent field specifiers).
>
> This ambiguity is why I continue to have reservations about basing
> behavior (other than the encoded values of literals) on the
> compile-time literal encoding.
>
>
> First of, I'm sorry for the miscommunication issue.
> I think I understand you better now.
No problem, we worked through it. I think we may be lacking some
terminology that would help to be more specific. And there are a fair
number of moving parts involved.
>
> I think this is a good motivation to make the default behavior local
> independant. I am really concerned. of the timeline here....
Yes, me too.
> Then I think we should be vigilant not to try to shoehorn locales
> "fixes" onto std::print.
Agreed.
> But converting locale things _to_ utf-8 seems...okay.
> It certainly doesn't make things worse!
Yeah, it might still be surprising and unwanted in some cases though.
Perhaps:
1. Make the chrono format specifiers locale independent (e.g., always
"C" locale).
2. Do not provide a 'L' specifier for locale dependent chrono format
specifiers.
3. Provide a mechanism for locales to distinguish translation and
encoding (arguably this exists with the current std::locale facets,
but...)
4. Introduce a specifier once an interface is available for
std::format() to request a localized translation in a particular
encoding. This may require the ability to separately specify the
encoding. For example, "{:%rL}" for locale encoding, and "{:%rLu8}"
for locale translation in UTF-8. Maybe we can default the encoding
in a smarter way.
>
> An important point to keep in mind is: how do we evolve that thing :)
Definitely!
Tom.
>
> But again,
>
> Tom.
>
>> Tom.
>>
>>
>
Received on 2021-04-27 16:19:47