On Tue, Jun 8, 2021 at 3:42 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

General polls:

Poll 2: P2093R6: <format> and <print> facilities should have consistent behavior with respect to encoding expectations for the output of formatters.

If Poll 2  is accepted, then the nature of the expectations and the method by which binary data is presented for `format` would affect the possible ways by which the direction in Poll 3 can be implemented.

Poll 3: P2093R6: Regardless of format string encoding assumptions, <format> facilities (but not <print> facilities) may be used to format binary data.

Is the intention that there will be alternative interfaces (e.g., a binary formatter Concept) to provide some way to get past string encoding assumptions?

N.B. the implementation does not inspect the result of a std::format() invocation, but this is not necessarily true for std::print().


How should invalidly encoded text be handled when transcoding for the purpose of writing directly to a device interface?

Encoding issues may be introduced by any of the following:
  • A format string that is not encoded as expected by the formatting facility.
  • Text provided by a formatter that is differently encoded with respect to the format string or other formatters.  This covers both standard and user provided formatters and contributions from locale dependent text.
Poll 4: P2093R6: <print> facilities exhibit undefined behavior when a format string or formatter output does not match encoding expectations.

My understanding is that the LWG 3565 is considered to resolve the issue presented therein by having the implementation of the standard formatter perform actions such that the encoding expectations are met.

N.B. the poll is phrased so as to be independent of whether output is directed to a device or not.

Poll 5: P2093R6: <print> facilities exhibit undefined behavior when a format string or formatter output does not match encoding expectations and output is directed to a device that has encoding expectations.

Meaning the opposite direction to requiring replacement characters.

Poll 6: P2093R6: <print> facility implementors are encouraged to provide a run-time means for diagnosing format strings and formatter output that does not match encoding expectations.

Meaning not a standardized error handling facility and also not necessarily accessible "programmatically".

Poll 7: P2093R6: <print> facility implementors are encouraged to substitute U+FFFD replacement characters following Unicode guidance when output is directed to a device and transcoding is necessary.

N.B. transcoding is not necessarily required; e.g., when encoding expectations are for UTF-8 and the device interface expects UTF-8.

Poll 8: P2093R6: Neither <format> nor <print> facilities require an explicit program-controlled error handling mechanism for violations of encoding expectations.

N.B. such error handling mechanisms could be introduced in the future.


Is use of UTF-8 as the literal encoding a sufficient indicator that all input fed to std::format() and std::print() (including the format string, programmer supplied field arguments, and locale provided text) will be UTF-8 encoded?

Poll 9: P2093R6: Use of UTF-8 as the literal encoding is sufficient for <format> and <print> facilities to assume that the format string and output of all formatters is UTF-8 encoded.

Except for binary data...
 


Is the literal encoding a sufficient indicator in general that all input fed to std::format() and std::print() (including the format string, programmer supplied field arguments, and locale provided text) will be provided in an encoding compatible with the literal encoding?

Poll 10: P2093R6: Use of a literal encoding other than UTF-8 is sufficient for <format> and <print> facilities to assume any particular encoding for the format string and output of formatters.


What are the implications for future support of std::print("{} {} {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text")?

Poll 11: P2093R6: Support for implicit encoding conversions will only be possible when an encoding assumption is implicitly or explicitly present.

Meaning an implicit UTF-8 assumption is enough of a foundation for some cases.
 

N.B. a future paper could add the ability to pass an explicit encoding tag; std::print(std::text_encoding::id::IBM1047, "{}", u8"hi").


LWG 3565: Handling of encodings in localized formatting of chrono types is underspecified

Poll 12: LWG 3565: Adopt the proposed resolution as is.

I believe that the availability of the transcoding facility should not be assumed.
 


Tom.

On 5/31/21 12:45 AM, Tom Honermann via SG16 wrote:

SG16 will hold a telecon on Wednesday, June 9th at 19:30 UTC (timezone conversion).

The agenda is:

  • D2295R4: Support for UTF-8 as a portable source file encoding
  • P2093R6: Formatted output
    • Continue discussion and poll for consensus on answers to the following questions:
      1. How should invalidly encoded text be handled when transcoding for the purpose of writing directly to a device interface?
      2. Is use of UTF-8 as the literal encoding a sufficient indicator that all input fed to std::format() and std::print() (including the format string, programmer supplied field arguments, and locale provided text) will be UTF-8 encoded?
      3. Is the literal encoding a sufficient indicator in general that all input fed to std::format() and std::print() (including the format string, programmer supplied field arguments, and locale provided text) will be provided in an encoding compatible with the literal encoding?
      4. What are the implications for future support of std::print("{} {} {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text")?

Discussion of D2295R4 is contingent on updated wording being available.

For P2093R6, I believe we have sufficiently discussed transcoding concerns (including concerns related to locale provided field arguments) to be able to answer the first question above with strong consensus.  I likewise suspect that further discussion on the third question is unnecessary and that we are reasonably well positioned to poll it.  We began discussion around the second question at the last telecon, but I feel that some more discussion is needed.  We haven't discussed question four at all, but I expect to arrive at a clearly objective answer for that one.

I would like for us to complete discussion and polling for P2093 during this telecon.  I don't know if that is realistic, but that is what we'll aim for.  I will reply to this email with a set of candidate polls in advance of the telecon with the hope that we'll be able to reduce time negotiating polls during the telecon itself.

Tom.




--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16