C++ Logo


Advanced search

Re: [SG16] Agenda for the 2021-04-28 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 27 Apr 2021 17:19:44 -0400
On 4/27/21 3:52 PM, Corentin Jabot wrote:
> On Tue, Apr 27, 2021 at 8:12 PM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
> On 4/27/21 1:56 PM, Corentin Jabot wrote:
>> On Tue, Apr 27, 2021 at 7:51 PM Tom Honermann <tom_at_[hidden]
>> <mailto:tom_at_[hidden]>> wrote:
>> On 4/27/21 1:43 PM, Corentin Jabot wrote:
>>> On Tue, Apr 27, 2021 at 7:20 PM Tom Honermann
>>> <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>>> On 4/27/21 12:27 PM, Corentin Jabot wrote:
>>>> I think we've been focusing on different things
>>>> here. The issue I'm trying to discuss is
>>>> independent of use of the
>>>> write-directly-to-the-console method. This
>>>> discussion is about having std::print() (and
>>>> std::format()) internally ensure that that
>>>> format arguments provided by the locale are
>>>> transcoded to match the encoding of the format
>>>> string. This happens before anything is
>>>> written to the console; this is the step where
>>>> the formatting is done and the intent is to
>>>> ensure that well-formed text is produced
>>>> *before* it is transcoded to the native console
>>>> encoding (whether that be UTF-8, UTF-16,
>>>> whatever). Transcoding requires well-formed
>>>> input of course.
>>>> Does this help to get us on the same page
>>>> I actually disagree with that.
>>>> I don't think there is intent in the current design
>>>> that the output has to be text at all. I could use
>>>> format to create some kind of binary format if i wanted
>>>> to, except the _formatting_ string is text because it
>>>> needs to be parsed,
>>>> So format as specified doesn't put requirements on the
>>>> arguments beyond the formatting string and doesn't need to.
>>>> What makes print text is that it outputs to the
>>>> console, at which point text is assumed.
>>>> The transcoding happens after formating, and might as
>>>> well not
>>>> forrmat(a, b, c) -> result
>>>> printUtf8ToConsole(result);
>>>> The fact that printUtf8 is implemented as
>>>> printUTF16(toUTF16(result)) is an implementation detail
>>>> that should not be observable nor described by the C++
>>>> standard.
>>>> And I don't think print should do _anything_ to check
>>>> for some amount of validity before printing out something.
>>> I don't disagree with what you wrote above, but it is
>>> not relevant to this discussion. I don't know why we're
>>> having such a hard time communicating here. Please,
>>> carefully re-read some of my prior responses with the
>>> understanding that how you have understood them so far
>>> does not match what I intended. If you then have
>>> clarifying questions, please feel free to ask them.
>>> Okay, so your point is that implementations should do
>>> something magical for things that are formatted through a
>>> locale facet on the basis the encoding of the result
>>> of time_put is known?
>> Yes, with two minor caveats.
>> 1. I don't see this as magical since the source and target
>> encodings are known.
>> 2. I'm only suggesting this as a design option for us to
>> consider. I'm not claiming that I think this is the best
>> approach to the problem (I'm undecided as to what
>> solution I favor so far).
>> Another question: do you think format should have the same behavior?
> I want the answer to be yes, that they should behave consistently,
> but I acknowledge this is more complicated. For example, a
> programmer may intend to format text in the locale encoding
> regardless of whether the literal encoding is UTF-8 or not. In
> that scenario, there is an implication that the format string be
> limited to characters that are valid for the locale encoding. On
> the other hand, the programmer may intend to produce UTF-8 text
> and be quite surprised when std::format() inserts codepage 932
> text in their output (regardless of whether their format string
> contains explicitly locale dependent field specifiers).
> This ambiguity is why I continue to have reservations about basing
> behavior (other than the encoded values of literals) on the
> compile-time literal encoding.
> First of, I'm sorry for the miscommunication issue.
> I think I understand you better now.
No problem, we worked through it. I think we may be lacking some
terminology that would help to be more specific. And there are a fair
number of moving parts involved.
> I think this is a good motivation to make the default behavior local
> independant. I am really concerned. of the timeline here....
Yes, me too.
> Then I think we should be vigilant not to try to shoehorn locales
> "fixes" onto std::print.
> But converting locale things _to_ utf-8 seems...okay.
> It certainly doesn't make things worse!

Yeah, it might still be surprising and unwanted in some cases though.

 1. Make the chrono format specifiers locale independent (e.g., always
    "C" locale).
 2. Do not provide a 'L' specifier for locale dependent chrono format
 3. Provide a mechanism for locales to distinguish translation and
    encoding (arguably this exists with the current std::locale facets,
 4. Introduce a specifier once an interface is available for
    std::format() to request a localized translation in a particular
    encoding. This may require the ability to separately specify the
    encoding. For example, "{:%rL}" for locale encoding, and "{:%rLu8}"
    for locale translation in UTF-8. Maybe we can default the encoding
    in a smarter way.

> An important point to keep in mind is: how do we evolve that thing :)



> But again,
> Tom.
>> Tom.

Received on 2021-04-27 16:19:47