C++ Logo

SG16

Advanced search

Subject: Re: Agenda for the 2021-06-23 SG16 telecon
From: Steve Downey (sdowney_at_[hidden])
Date: 2021-06-22 19:53:20


In fact ANSI and ECMA-48 escape sequences only require 7 bits of data, and
they include escapes for what would be C1 control characters above 7F, so
there's no need for anything that doesn't appear to be valid UTF-8.
In other encodings, you would have to avoid transcoding, since it's the
byte values that drive the terminals state machine.

Kermit, xmodem, etc, would use shift locking to encode 8 bit data over a 7
bit channel, if I remember correctly. But that's really getting outside the
bounds of text handling.

On Tue, Jun 22, 2021, 15:10 Corentin Jabot via SG16 <sg16_at_[hidden]>
wrote:

> On Tue, Jun 22, 2021 at 8:29 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> Reminder that this meeting is taking place tomorrow.
>>
>> Once we complete the remaining design polls, I'd like to clarify what
>> might be perceived as a conflict in our poll results for poll 3.2 vs polls
>> 4 and 5; we can't state both that binary data may be formatted and that
>> formatter output that doesn't match an expected encoding is UB. Hubert's
>> suggestion of an escape mechanism would suffice to resolve the apparent
>> conflict. Are there other ways to interpret these polls? Or other
>> solutions to resolve the apparent conflict?
>>
>
> The way I see it,
>
> UB:
>
> - Interpreting the formatting string as being in an encoding that is
> not the encoding of the formatting string
> - Outputting on a character device something that is not in the
> encoding of that character device
> - This includes outputting non-utf-8 on the utf-8 character device
> that exists on some platforms
>
> Not UB
>
> - Outputting anything on a binary device ( file for example )
>
>
> I would like to see presented use cases for outputting binary data to a
> character device.
> Especially, if the concern is about bash escape sequences for example,
> this is still text from the point of view of C++ ( "\e[31m" is still just
> text - a sequence of characters when it is *produced* by a C++ program).
> If there are other use cases, I would very much like to understand them.
> In particular, I don't see how a general purpose utf8 decoder could
> detect, let alone handle binary data in the middle of an utf-8 sequence
>
>
>
>>
>> Tom.
>>
>> On 6/17/21 11:56 AM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, June 23rd at 19:30 UTC (timezone
>> conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20210623T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
>> ).
>>
>> The agenda is:
>>
>> - P2093R6: Formatted output <https://wg21.link/p2093r6>
>> - Finish polling begun at the last telecon.
>> - LWG 3565: Handling of encodings in localized formatting of chrono
>> types is underspecified <https://cplusplus.github.io/LWG/issue3565>
>> - Discuss and poll the proposed resolution.
>> - P2295R4: Support for UTF-8 as a portable source file encoding
>> <https://wg21.link/p2295r4>
>> - Review updated wording produced through collaboration between
>> Corentin, Jens, Hubert, and Peter.
>> - https://lists.isocpp.org/sg16/2021/04/2353.php
>> - https://lists.isocpp.org/sg16/2021/06/2429.php
>>
>> At the last telecon, we discussed addressing LWG 3565 as the first agenda
>> item for this telecon. However, I would prefer to finish polling for
>> P2093R6 first as I expect some of the remaining candidate polls to be
>> potentially relevant for the LWG issue resolution.
>>
>> For reference, here are the P2093R6 polls and poll results taken during
>> the last telecon (I'll get the meeting summary posted soon). Consensus so
>> far appears to be rather strong with the exception of poll 3.2.
>>
>> - *Poll 1: P2093R6: <format> and <print> facilities should have
>> consistent behavior with respect to encoding expectations for the format
>> string.*
>> Attendees: 8
>> No objection to unanimous consent.
>> - *Poll 2.1: P2093R6: <format> and <print> facilities should have
>> consistent behavior with respect to encoding expectations for the output of
>> formatters.*
>> <Not polled; per discussion, revisit following later polls>
>> - *Poll 2.2: P2093R6: formatters should not be sensitive to whether
>> they are being used with a <format> or <print> facility.*
>> Attendees: 8
>> No objection to unanimous consent.
>> - *Poll 3.1: P2093R6: Regardless of format string encoding
>> assumptions, <format> facilities may be used to format binary data.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 5 1 1 0 0
>> Strong consensus
>> - *Poll 3.2: P2093R6: Regardless of format string encoding
>> assumptions, <print> facilities may be used to format binary data.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 2 1 3 1 0
>> Weak consensus
>> - *Poll 4: P2093R6: <print> facilities exhibit undefined behavior
>> when an encoding expectation is present and a format string or formatter
>> output does not match those expectations.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 2 4 0 0 1
>> Strong consensus
>> - *Poll 5: P2093R6: <print> facilities exhibit undefined behavior
>> when an encoding expectation is present and a format string or formatter
>> output does not match those expectations and output is directed to a device
>> that has encoding expectations.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 6 0 1 0 0
>> Stronger consensus than poll 4.
>> - *Poll 6: P2093R6: <print> facility implementors are encouraged to
>> provide a run-time means for diagnosing format strings and formatter output
>> that is not well-formed according to the expected encoding.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 4 0 2 1 0
>> Consensus.
>>
>> The remaining candidate polls are:
>>
>> - Poll 2.1: P2093R6: <format> and <print> facilities should have
>> consistent behavior with respect to encoding expectations for the output of
>> formatters.
>> - Poll 7: P2093R6: <print> facility implementors are encouraged to
>> substitute U+FFFD replacement characters following Unicode guidance when
>> output is directed to a device and transcoding is necessary.
>> - Poll 8: P2093R6: Neither <format> nor <print> facilities require an
>> explicit program-controlled error handling mechanism for violations of
>> encoding expectations.
>> - Poll 9: P2093R6: Use of UTF-8 as the literal encoding is sufficient
>> for <format> and <print> facilities to assume that the format string and
>> output of all formatters is UTF-8 encoded.
>> - Poll 10: P2093R6: Use of a literal encoding other than UTF-8 is
>> sufficient for <format> and <print> facilities to assume a particular
>> encoding for the format string and output of formatters.
>> - Poll 11: P2093R6: Support for implicit encoding conversions will
>> only be possible when an encoding assumption is implicitly or explicitly
>> present.
>>
>> Assuming good consensus on those polls, we'll poll forwarding P2093R6 to
>> LEWG again with direction to revise the paper to align with SG16 feedback.
>> At a minimum, a revision will be expected to record SG16 direction and
>> rationale. In order to avoid spending more SG16 telecon time on this
>> paper, we'll look for a volunteer to review the updated revision and report
>> back to SG16.
>>
>> - Poll X: P02093R6: Direct Victor to revise the paper to reflect SG16
>> rationale and guidance, delegate review of a future revision to XXX, and
>> forward to LEWG for inclusion in C++23 pending review confirmation.
>>
>> Tom.
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>



SG16 list run by sg16-owner@lists.isocpp.org