C++ Logo

sg16

Advanced search

Re: [SG16] Agenda for the 2021-06-23 SG16 telecon

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Tue, 22 Jun 2021 21:09:35 +0200
On Tue, Jun 22, 2021 at 8:29 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> Reminder that this meeting is taking place tomorrow.
>
> Once we complete the remaining design polls, I'd like to clarify what
> might be perceived as a conflict in our poll results for poll 3.2 vs polls
> 4 and 5; we can't state both that binary data may be formatted and that
> formatter output that doesn't match an expected encoding is UB. Hubert's
> suggestion of an escape mechanism would suffice to resolve the apparent
> conflict. Are there other ways to interpret these polls? Or other
> solutions to resolve the apparent conflict?
>

The way I see it,

UB:

   - Interpreting the formatting string as being in an encoding that is not
   the encoding of the formatting string
   - Outputting on a character device something that is not in the
   encoding of that character device
      - This includes outputting non-utf-8 on the utf-8 character device
      that exists on some platforms

Not UB

   - Outputting anything on a binary device ( file for example )


I would like to see presented use cases for outputting binary data to a
character device.
Especially, if the concern is about bash escape sequences for example, this
is still text from the point of view of C++ ( "\e[31m" is still just text -
a sequence of characters when it is *produced* by a C++ program).
If there are other use cases, I would very much like to understand them.
In particular, I don't see how a general purpose utf8 decoder could detect,
let alone handle binary data in the middle of an utf-8 sequence



>
> Tom.
>
> On 6/17/21 11:56 AM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, June 23rd at 19:30 UTC (timezone
> conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20210623T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
> ).
>
> The agenda is:
>
> - P2093R6: Formatted output <https://wg21.link/p2093r6>
> - Finish polling begun at the last telecon.
> - LWG 3565: Handling of encodings in localized formatting of chrono
> types is underspecified <https://cplusplus.github.io/LWG/issue3565>
> - Discuss and poll the proposed resolution.
> - P2295R4: Support for UTF-8 as a portable source file encoding
> <https://wg21.link/p2295r4>
> - Review updated wording produced through collaboration between
> Corentin, Jens, Hubert, and Peter.
> - https://lists.isocpp.org/sg16/2021/04/2353.php
> - https://lists.isocpp.org/sg16/2021/06/2429.php
>
> At the last telecon, we discussed addressing LWG 3565 as the first agenda
> item for this telecon. However, I would prefer to finish polling for
> P2093R6 first as I expect some of the remaining candidate polls to be
> potentially relevant for the LWG issue resolution.
>
> For reference, here are the P2093R6 polls and poll results taken during
> the last telecon (I'll get the meeting summary posted soon). Consensus so
> far appears to be rather strong with the exception of poll 3.2.
>
> - *Poll 1: P2093R6: <format> and <print> facilities should have
> consistent behavior with respect to encoding expectations for the format
> string.*
> Attendees: 8
> No objection to unanimous consent.
> - *Poll 2.1: P2093R6: <format> and <print> facilities should have
> consistent behavior with respect to encoding expectations for the output of
> formatters.*
> <Not polled; per discussion, revisit following later polls>
> - *Poll 2.2: P2093R6: formatters should not be sensitive to whether
> they are being used with a <format> or <print> facility.*
> Attendees: 8
> No objection to unanimous consent.
> - *Poll 3.1: P2093R6: Regardless of format string encoding
> assumptions, <format> facilities may be used to format binary data.*
> Attendees: 8 (1 abstention)
> SF F N A SA
> 5 1 1 0 0
> Strong consensus
> - *Poll 3.2: P2093R6: Regardless of format string encoding
> assumptions, <print> facilities may be used to format binary data.*
> Attendees: 8 (1 abstention)
> SF F N A SA
> 2 1 3 1 0
> Weak consensus
> - *Poll 4: P2093R6: <print> facilities exhibit undefined behavior when
> an encoding expectation is present and a format string or formatter output
> does not match those expectations.*
> Attendees: 8 (1 abstention)
> SF F N A SA
> 2 4 0 0 1
> Strong consensus
> - *Poll 5: P2093R6: <print> facilities exhibit undefined behavior when
> an encoding expectation is present and a format string or formatter output
> does not match those expectations and output is directed to a device that
> has encoding expectations.*
> Attendees: 8 (1 abstention)
> SF F N A SA
> 6 0 1 0 0
> Stronger consensus than poll 4.
> - *Poll 6: P2093R6: <print> facility implementors are encouraged to
> provide a run-time means for diagnosing format strings and formatter output
> that is not well-formed according to the expected encoding.*
> Attendees: 8 (1 abstention)
> SF F N A SA
> 4 0 2 1 0
> Consensus.
>
> The remaining candidate polls are:
>
> - Poll 2.1: P2093R6: <format> and <print> facilities should have
> consistent behavior with respect to encoding expectations for the output of
> formatters.
> - Poll 7: P2093R6: <print> facility implementors are encouraged to
> substitute U+FFFD replacement characters following Unicode guidance when
> output is directed to a device and transcoding is necessary.
> - Poll 8: P2093R6: Neither <format> nor <print> facilities require an
> explicit program-controlled error handling mechanism for violations of
> encoding expectations.
> - Poll 9: P2093R6: Use of UTF-8 as the literal encoding is sufficient
> for <format> and <print> facilities to assume that the format string and
> output of all formatters is UTF-8 encoded.
> - Poll 10: P2093R6: Use of a literal encoding other than UTF-8 is
> sufficient for <format> and <print> facilities to assume a particular
> encoding for the format string and output of formatters.
> - Poll 11: P2093R6: Support for implicit encoding conversions will
> only be possible when an encoding assumption is implicitly or explicitly
> present.
>
> Assuming good consensus on those polls, we'll poll forwarding P2093R6 to
> LEWG again with direction to revise the paper to align with SG16 feedback.
> At a minimum, a revision will be expected to record SG16 direction and
> rationale. In order to avoid spending more SG16 telecon time on this
> paper, we'll look for a volunteer to review the updated revision and report
> back to SG16.
>
> - Poll X: P02093R6: Direct Victor to revise the paper to reflect SG16
> rationale and guidance, delegate review of a future revision to XXX, and
> forward to LEWG for inclusion in C++23 pending review confirmation.
>
> Tom.
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-06-22 14:09:49