C++ Logo

SG16

Advanced search

Subject: Re: Agenda for the 2021-06-09 SG16 telecon
From: Hubert Tong (hubert.reinterpretcast_at_[hidden])
Date: 2021-06-09 14:15:10


On Tue, Jun 8, 2021 at 3:42 PM Tom Honermann via SG16 <sg16_at_[hidden]>
wrote:

>
> *General polls:*
>
> *Poll 2:* P2093R6: <format> and <print> facilities should have consistent
> behavior with respect to encoding expectations for the output of formatters.
>
If Poll 2 is accepted, then the nature of the expectations and the method
by which binary data is presented for `format` would affect the possible
ways by which the direction in Poll 3 can be implemented.

> *Poll 3:* P2093R6: Regardless of format string encoding assumptions,
> <format> facilities (but not <print> facilities) may be used to format
> binary data.
>
Is the intention that there will be alternative interfaces (e.g., a binary
formatter Concept) to provide some way to get past string encoding
assumptions?

> N.B. the implementation does not inspect the result of a std::format()
> invocation, but this is not necessarily true for std::print().
>
> *How should invalidly encoded text be handled when transcoding for the
> purpose of writing directly to a device interface?*
>
> Encoding issues may be introduced by any of the following:
>
> - A format string that is not encoded as expected by the formatting
> facility.
> - Text provided by a formatter that is differently encoded with
> respect to the format string or other formatters. This covers both
> standard and user provided formatters and contributions from locale
> dependent text.
>
> *Poll 4:* P2093R6: <print> facilities exhibit undefined behavior when a
> format string or formatter output does not match encoding expectations.
>

My understanding is that the LWG 3565 is considered to resolve the issue
presented therein by having the implementation of the standard formatter
perform actions such that the encoding expectations are met.

> N.B. the poll is phrased so as to be independent of whether output is
> directed to a device or not.
>
> *Poll 5:* P2093R6: <print> facilities exhibit undefined behavior when a
> format string or formatter output does not match encoding expectations and
> output is directed to a device that has encoding expectations.
>
Meaning the opposite direction to requiring replacement characters.

> *Poll 6:* P2093R6: <print> facility implementors are encouraged to
> provide a run-time means for diagnosing format strings and formatter output
> that does not match encoding expectations.
>
Meaning not a standardized error handling facility and also not necessarily
accessible "programmatically".

> *Poll 7:* P2093R6: <print> facility implementors are encouraged to
> substitute U+FFFD replacement characters following Unicode guidance when
> output is directed to a device and transcoding is necessary.
>
> N.B. transcoding is not necessarily required; e.g., when encoding
> expectations are for UTF-8 and the device interface expects UTF-8.
>
> *Poll 8:* P2093R6: Neither <format> nor <print> facilities require an
> explicit program-controlled error handling mechanism for violations of
> encoding expectations.
>
> N.B. such error handling mechanisms could be introduced in the future.
>
> *Is use of UTF-8 as the literal encoding a sufficient indicator that all
> input fed to std::format() and std::print() (including the format string,
> programmer supplied field arguments, and locale provided text) will be
> UTF-8 encoded?*
>
> *Poll 9:* P2093R6: Use of UTF-8 as the literal encoding is sufficient for
> <format> and <print> facilities to assume that the format string and output
> of all formatters is UTF-8 encoded.
>

Except for binary data...

>
>
> *Is the literal encoding a sufficient indicator in general that all input
> fed to std::format() and std::print() (including the format string,
> programmer supplied field arguments, and locale provided text) will be
> provided in an encoding compatible with the literal encoding?*
>
> *Poll 10:* P2093R6: Use of a literal encoding other than UTF-8 is
> sufficient for <format> and <print> facilities to assume any particular
> encoding for the format string and output of formatters.
>
>
> *What are the implications for future support of std::print("{} {} {} {}",
> L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text")?*
>
> *Poll 11:* P2093R6: Support for implicit encoding conversions will only
> be possible when an encoding assumption is implicitly or explicitly present.
>

Meaning an implicit UTF-8 assumption is enough of a foundation for some
cases.

>
> N.B. a future paper could add the ability to pass an explicit encoding
> tag; std::print(std::text_encoding::id::IBM1047, "{}", u8"hi").
>
>
> *LWG 3565: Handling of encodings in localized formatting of **chrono**
> types is underspecified*
>
> *Poll 12:* LWG 3565: Adopt the proposed resolution as is.
>

I believe that the availability of the transcoding facility should not be
assumed.

>
>
> Tom.
>
> On 5/31/21 12:45 AM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, June 9th at 19:30 UTC (timezone
> conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20210609T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
> ).
>
> The agenda is:
>
> - D2295R4: Support for UTF-8 as a portable source file encoding
> <https://isocpp.org/files/papers/D2295R4.pdf>
> - Review updated wording produced through collaboration between
> Corentin, Jens, and Hubert to resolve earlier feedback at
> https://lists.isocpp.org/sg16/2021/04/2353.php.
> - P2093R6: Formatted output <https://wg21.link/p2093r6>
> - Continue discussion and poll for consensus on answers to the
> following questions:
> 1. How should invalidly encoded text be handled when transcoding
> for the purpose of writing directly to a device interface?
> 2. Is use of UTF-8 as the literal encoding a sufficient
> indicator that all input fed to std::format() and std::print()
> (including the format string, programmer supplied field arguments, and
> locale provided text) will be UTF-8 encoded?
> 3. Is the literal encoding a sufficient indicator in general
> that all input fed to std::format() and std::print() (including
> the format string, programmer supplied field arguments, and locale provided
> text) will be provided in an encoding compatible with the literal encoding?
> 4. What are the implications for future support of std::print("{}
> {} {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text")
> ?
>
> Discussion of D2295R4 is contingent on updated wording being available.
>
> For P2093R6, I believe we have sufficiently discussed transcoding concerns
> (including concerns related to locale provided field arguments) to be able
> to answer the first question above with strong consensus. I likewise
> suspect that further discussion on the third question is unnecessary and
> that we are reasonably well positioned to poll it. We began discussion
> around the second question at the last telecon, but I feel that some more
> discussion is needed. We haven't discussed question four at all, but I
> expect to arrive at a clearly objective answer for that one.
>
> I would like for us to complete discussion and polling for P2093 during
> this telecon. I don't know if that is realistic, but that is what we'll
> aim for. I will reply to this email with a set of candidate polls in
> advance of the telecon with the hope that we'll be able to reduce time
> negotiating polls during the telecon itself.
>
> Tom.
>
>
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>



SG16 list run by sg16-owner@lists.isocpp.org