sg16: Re: [SG16] Agenda for the 2021-06-09 SG16 telecon

From: Peter Brett <pbrett_at_[hidden]>
Date: Wed, 9 Jun 2021 18:13:31 +0000

Hi all,

Sorry for the late notice, but I am definitely *not* going to make it to today’s SG16 meeting.

Looking forward to speaking to you all in a couple of weeks!

Best regards,

                 Peter

From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: 08 June 2021 20:43
To: sg16_at_[hidden]cpp.org
Cc: Tom Honermann <tom_at_[hidden]>
Subject: Re: [SG16] Agenda for the 2021-06-09 SG16 telecon

EXTERNAL MAIL
Reminder that this meeting is taking place tomorrow.

Updated wording is not yet available for P2295, so is dropped from the agenda. We will therefore focus exclusively on establishing consensus for P2093R6. As part of that effort, we'll discuss the newly created LWG issue 3565<https://urldefense.com/v3/__https:/wg21.link/lwg3565__;!!EHscmS1ygiU1lA!VnifoAHvzLQPmRf2Vx3JABhZTrbTEEy4tXarDfgoEv9CdgpbTYJoeMMXn5f_qg$> that Victor requested following our last telecon. If we run out of time, we'll discuss that issue next time.

I have the following candidate polls prepared. Thank you to Peter for his assistance in drafting these. Please respond with suggested refinements so that we can avoid spending telecon time addressing technicalities.

These polls are intended to establish consensus regarding our response to the questions presented below.

Many of the polls are applicable to std::format() as well as std::print(). This is intentional. The results may be used to guide further papers or resolution of LWG issues.

General polls:

Poll 1: P2093R6: <format> and <print> facilities should have consistent behavior with respect to encoding expectations for the format string.

N.B. Whether encoding expectations exist may depend on other factors, such as what encoding is used for the literal encoding.

Poll 2: P2093R6: <format> and <print> facilities should have consistent behavior with respect to encoding expectations for the output of formatters.

Poll 3: P2093R6: Regardless of format string encoding assumptions, <format> facilities (but not <print> facilities) may be used to format binary data.

N.B. the implementation does not inspect the result of a std::format() invocation, but this is not necessarily true for std::print().

How should invalidly encoded text be handled when transcoding for the purpose of writing directly to a device interface?

Encoding issues may be introduced by any of the following:

  * A format string that is not encoded as expected by the formatting facility.
  * Text provided by a formatter that is differently encoded with respect to the format string or other formatters. This covers both standard and user provided formatters and contributions from locale dependent text.
Poll 4: P2093R6: <print> facilities exhibit undefined behavior when a format string or formatter output does not match encoding expectations.

N.B. the poll is phrased so as to be independent of whether output is directed to a device or not.

Poll 5: P2093R6: <print> facilities exhibit undefined behavior when a format string or formatter output does not match encoding expectations and output is directed to a device that has encoding expectations.

Poll 6: P2093R6: <print> facility implementors are encouraged to provide a run-time means for diagnosing format strings and formatter output that does not match encoding expectations.

Poll 7: P2093R6: <print> facility implementors are encouraged to substitute U+FFFD replacement characters following Unicode guidance when output is directed to a device and transcoding is necessary.

N.B. transcoding is not necessarily required; e.g., when encoding expectations are for UTF-8 and the device interface expects UTF-8.

Poll 8: P2093R6: Neither <format> nor <print> facilities require an explicit program-controlled error handling mechanism for violations of encoding expectations.

N.B. such error handling mechanisms could be introduced in the future.

Is use of UTF-8 as the literal encoding a sufficient indicator that all input fed to std::format() and std::print() (including the format string, programmer supplied field arguments, and locale provided text) will be UTF-8 encoded?

Poll 9: P2093R6: Use of UTF-8 as the literal encoding is sufficient for <format> and <print> facilities to assume that the format string and output of all formatters is UTF-8 encoded.

Is the literal encoding a sufficient indicator in general that all input fed to std::format() and std::print() (including the format string, programmer supplied field arguments, and locale provided text) will be provided in an encoding compatible with the literal encoding?

Poll 10: P2093R6: Use of a literal encoding other than UTF-8 is sufficient for <format> and <print> facilities to assume any particular encoding for the format string and output of formatters.

What are the implications for future support of std::print("{} {} {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text")?

Poll 11: P2093R6: Support for implicit encoding conversions will only be possible when an encoding assumption is implicitly or explicitly present.

N.B. a future paper could add the ability to pass an explicit encoding tag; std::print(std::text_encoding::id::IBM1047, "{}", u8"hi").

LWG 3565: Handling of encodings in localized formatting of chrono types is underspecified

Poll 12: LWG 3565: Adopt the proposed resolution as is.

Tom.

On 5/31/21 12:45 AM, Tom Honermann via SG16 wrote:

SG16 will hold a telecon on Wednesday, June 9th at 19:30 UTC (timezone conversion<https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20210609T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest__;!!EHscmS1ygiU1lA!VnifoAHvzLQPmRf2Vx3JABhZTrbTEEy4tXarDfgoEv9CdgpbTYJoeMN0VLnKqg$>).

The agenda is:

  * D2295R4: Support for UTF-8 as a portable source file encoding<https://urldefense.com/v3/__https:/isocpp.org/files/papers/D2295R4.pdf__;!!EHscmS1ygiU1lA!VnifoAHvzLQPmRf2Vx3JABhZTrbTEEy4tXarDfgoEv9CdgpbTYJoeMOjFzTtjg$>

     * Review updated wording produced through collaboration between Corentin, Jens, and Hubert to resolve earlier feedback at https://lists.isocpp.org/sg16/2021/04/2353.php<https://urldefense.com/v3/__https:/lists.isocpp.org/sg16/2021/04/2353.php__;!!EHscmS1ygiU1lA!VnifoAHvzLQPmRf2Vx3JABhZTrbTEEy4tXarDfgoEv9CdgpbTYJoeMOg9ZtPMw$>.

  * P2093R6: Formatted output<https://urldefense.com/v3/__https:/wg21.link/p2093r6__;!!EHscmS1ygiU1lA!VnifoAHvzLQPmRf2Vx3JABhZTrbTEEy4tXarDfgoEv9CdgpbTYJoeMMSdItnzw$>

     * Continue discussion and poll for consensus on answers to the following questions:

        * How should invalidly encoded text be handled when transcoding for the purpose of writing directly to a device interface?
        * Is use of UTF-8 as the literal encoding a sufficient indicator that all input fed to std::format() and std::print() (including the format string, programmer supplied field arguments, and locale provided text) will be UTF-8 encoded?
        * Is the literal encoding a sufficient indicator in general that all input fed to std::format() and std::print() (including the format string, programmer supplied field arguments, and locale provided text) will be provided in an encoding compatible with the literal encoding?
        * What are the implications for future support of std::print("{} {} {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text")?

Discussion of D2295R4 is contingent on updated wording being available.

For P2093R6, I believe we have sufficiently discussed transcoding concerns (including concerns related to locale provided field arguments) to be able to answer the first question above with strong consensus. I likewise suspect that further discussion on the third question is unnecessary and that we are reasonably well positioned to poll it. We began discussion around the second question at the last telecon, but I feel that some more discussion is needed. We haven't discussed question four at all, but I expect to arrive at a clearly objective answer for that one.

I would like for us to complete discussion and polling for P2093 during this telecon. I don't know if that is realistic, but that is what we'll aim for. I will reply to this email with a set of candidate polls in advance of the telecon with the hope that we'll be able to reduce time negotiating polls during the telecon itself.

Tom.

Received on 2021-06-09 13:13:42