Date: Tue, 22 Jun 2021 18:06:12 -0400
On 6/22/21 3:09 PM, Corentin Jabot via SG16 wrote:
> On Tue, Jun 22, 2021 at 8:29 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> Reminder that this meeting is taking place tomorrow.
>
> Once we complete the remaining design polls, I'd like to clarify
> what might be perceived as a conflict in our poll results for poll
> 3.2 vs polls 4 and 5; we can't state both that binary data may be
> formatted and that formatter output that doesn't match an expected
> encoding is UB. Hubert's suggestion of an escape mechanism would
> suffice to resolve the apparent conflict. Are there other ways to
> interpret these polls? Or other solutions to resolve the apparent
> conflict?
>
>
> The way I see it,
>
> UB:
>
> * Interpreting the formatting string as being in an encoding that is
> not the encoding of the formatting string
> * Outputting on a character device something that is not in the
> encoding of that character device
> o This includes outputting non-utf-8 on the utf-8 character
> device that exists on some platforms
>
> Not UB
>
> * Outputting anything on a binary device ( file for example )
>
>
> I would like to see presented use cases for outputting binary data to
> a character device.
> Especially, if the concern is about bash escape sequences for example,
> this is still text from the point of view of C++ ( "\e[31m" is still
> just text - a sequence of characters when it is *produced* by a C++
> program).
> If there are other use cases, I would very much like to understand them.
> In particular, I don't see how a general purpose utf8 decoder could
> detect, let alone handle binary data in the middle of an utf-8 sequence
There are some interesting modern day protocol examples as well as older
protocols. Kitty <https://sw.kovidgoyal.net/kitty/index.html> is a
terminal editor with extensions that enable graphics support; its
graphics protocol
<https://sw.kovidgoyal.net/kitty/graphics-protocol.html> uses escape
sequences to deliver binary image data. In the distant past, protocols
like kermit <https://en.wikipedia.org/wiki/Kermit_(protocol)>, XMODEM
<https://en.wikipedia.org/wiki/XMODEM>, and ZMODEM
<https://en.wikipedia.org/wiki/ZMODEM> were used to transfer files via
terminal emulators.
I wouldn't be surprised if other existing terminal window or mouse
management controls implemented via escape sequences allow for a binary
payload.
I believe the escape mechanism that Hubert had in mind would require the
format string to adhere to the expected encoding, but would allow for
the output of formatters to be conditionally transcoded for output to a
device. I don't know how escaped binary data would be marshaled in the
case where a device interface is UTF-16 based though.
Tom.
>
>
> Tom.
>
> On 6/17/21 11:56 AM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, June 23rd at 19:30 UTC
>> (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20210623T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>
>> The agenda is:
>>
>> * P2093R6: Formatted output <https://wg21.link/p2093r6>
>> o Finish polling begun at the last telecon.
>> * LWG 3565: Handling of encodings in localized formatting of
>> chrono types is underspecified
>> <https://cplusplus.github.io/LWG/issue3565>
>> o Discuss and poll the proposed resolution.
>> * P2295R4: Support for UTF-8 as a portable source file encoding
>> <https://wg21.link/p2295r4>
>> o Review updated wording produced through collaboration
>> between Corentin, Jens, Hubert, and Peter.
>> + https://lists.isocpp.org/sg16/2021/04/2353.php
>> + https://lists.isocpp.org/sg16/2021/06/2429.php
>>
>> At the last telecon, we discussed addressing LWG 3565 as the
>> first agenda item for this telecon. However, I would prefer to
>> finish polling for P2093R6 first as I expect some of the
>> remaining candidate polls to be potentially relevant for the LWG
>> issue resolution.
>>
>> For reference, here are the P2093R6 polls and poll results taken
>> during the last telecon (I'll get the meeting summary posted
>> soon). Consensus so far appears to be rather strong with the
>> exception of poll 3.2.
>>
>> * *Poll 1: P2093R6: <format> and <print> facilities should have
>> consistent behavior with respect to encoding expectations for
>> the format string.*
>> Attendees: 8
>> No objection to unanimous consent.
>> * *Poll 2.1: P2093R6: <format> and <print> facilities should
>> have consistent behavior with respect to encoding
>> expectations for the output of formatters.*
>> <Not polled; per discussion, revisit following later polls>
>> * *Poll 2.2: P2093R6: formatters should not be sensitive to
>> whether they are being used with a <format> or <print> facility.*
>> Attendees: 8
>> No objection to unanimous consent.
>> * *Poll 3.1: P2093R6: Regardless of format string encoding
>> assumptions, <format> facilities may be used to format binary
>> data.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 5 1 1 0 0
>> Strong consensus
>> * *Poll 3.2: P2093R6: Regardless of format string encoding
>> assumptions, <print> facilities may be used to format binary
>> data.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 2 1 3 1 0
>> Weak consensus
>> * *Poll 4: P2093R6: <print> facilities exhibit undefined
>> behavior when an encoding expectation is present and a format
>> string or formatter output does not match those expectations.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 2 4 0 0 1
>> Strong consensus
>> * *Poll 5: P2093R6: <print> facilities exhibit undefined
>> behavior when an encoding expectation is present and a format
>> string or formatter output does not match those expectations
>> and output is directed to a device that has encoding
>> expectations.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 6 0 1 0 0
>> Stronger consensus than poll 4.
>> * *Poll 6: P2093R6: <print> facility implementors are
>> encouraged to provide a run-time means for diagnosing format
>> strings and formatter output that is not well-formed
>> according to the expected encoding.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 4 0 2 1 0
>> Consensus.
>>
>> The remaining candidate polls are:
>>
>> * Poll 2.1: P2093R6: <format> and <print> facilities should
>> have consistent behavior with respect to encoding
>> expectations for the output of formatters.
>> * Poll 7: P2093R6: <print> facility implementors are encouraged
>> to substitute U+FFFD replacement characters following Unicode
>> guidance when output is directed to a device and transcoding
>> is necessary.
>> * Poll 8: P2093R6: Neither <format> nor <print> facilities
>> require an explicit program-controlled error handling
>> mechanism for violations of encoding expectations.
>> * Poll 9: P2093R6: Use of UTF-8 as the literal encoding is
>> sufficient for <format> and <print> facilities to assume that
>> the format string and output of all formatters is UTF-8 encoded.
>> * Poll 10: P2093R6: Use of a literal encoding other than UTF-8
>> is sufficient for <format> and <print> facilities to assume a
>> particular encoding for the format string and output of
>> formatters.
>> * Poll 11: P2093R6: Support for implicit encoding conversions
>> will only be possible when an encoding assumption is
>> implicitly or explicitly present.
>>
>> Assuming good consensus on those polls, we'll poll forwarding
>> P2093R6 to LEWG again with direction to revise the paper to align
>> with SG16 feedback. At a minimum, a revision will be expected to
>> record SG16 direction and rationale. In order to avoid spending
>> more SG16 telecon time on this paper, we'll look for a volunteer
>> to review the updated revision and report back to SG16.
>>
>> * Poll X: P02093R6: Direct Victor to revise the paper to
>> reflect SG16 rationale and guidance, delegate review of a
>> future revision to XXX, and forward to LEWG for inclusion in
>> C++23 pending review confirmation.
>>
>> Tom.
>>
>>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
> On Tue, Jun 22, 2021 at 8:29 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> Reminder that this meeting is taking place tomorrow.
>
> Once we complete the remaining design polls, I'd like to clarify
> what might be perceived as a conflict in our poll results for poll
> 3.2 vs polls 4 and 5; we can't state both that binary data may be
> formatted and that formatter output that doesn't match an expected
> encoding is UB. Hubert's suggestion of an escape mechanism would
> suffice to resolve the apparent conflict. Are there other ways to
> interpret these polls? Or other solutions to resolve the apparent
> conflict?
>
>
> The way I see it,
>
> UB:
>
> * Interpreting the formatting string as being in an encoding that is
> not the encoding of the formatting string
> * Outputting on a character device something that is not in the
> encoding of that character device
> o This includes outputting non-utf-8 on the utf-8 character
> device that exists on some platforms
>
> Not UB
>
> * Outputting anything on a binary device ( file for example )
>
>
> I would like to see presented use cases for outputting binary data to
> a character device.
> Especially, if the concern is about bash escape sequences for example,
> this is still text from the point of view of C++ ( "\e[31m" is still
> just text - a sequence of characters when it is *produced* by a C++
> program).
> If there are other use cases, I would very much like to understand them.
> In particular, I don't see how a general purpose utf8 decoder could
> detect, let alone handle binary data in the middle of an utf-8 sequence
There are some interesting modern day protocol examples as well as older
protocols. Kitty <https://sw.kovidgoyal.net/kitty/index.html> is a
terminal editor with extensions that enable graphics support; its
graphics protocol
<https://sw.kovidgoyal.net/kitty/graphics-protocol.html> uses escape
sequences to deliver binary image data. In the distant past, protocols
like kermit <https://en.wikipedia.org/wiki/Kermit_(protocol)>, XMODEM
<https://en.wikipedia.org/wiki/XMODEM>, and ZMODEM
<https://en.wikipedia.org/wiki/ZMODEM> were used to transfer files via
terminal emulators.
I wouldn't be surprised if other existing terminal window or mouse
management controls implemented via escape sequences allow for a binary
payload.
I believe the escape mechanism that Hubert had in mind would require the
format string to adhere to the expected encoding, but would allow for
the output of formatters to be conditionally transcoded for output to a
device. I don't know how escaped binary data would be marshaled in the
case where a device interface is UTF-16 based though.
Tom.
>
>
> Tom.
>
> On 6/17/21 11:56 AM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, June 23rd at 19:30 UTC
>> (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20210623T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>
>> The agenda is:
>>
>> * P2093R6: Formatted output <https://wg21.link/p2093r6>
>> o Finish polling begun at the last telecon.
>> * LWG 3565: Handling of encodings in localized formatting of
>> chrono types is underspecified
>> <https://cplusplus.github.io/LWG/issue3565>
>> o Discuss and poll the proposed resolution.
>> * P2295R4: Support for UTF-8 as a portable source file encoding
>> <https://wg21.link/p2295r4>
>> o Review updated wording produced through collaboration
>> between Corentin, Jens, Hubert, and Peter.
>> + https://lists.isocpp.org/sg16/2021/04/2353.php
>> + https://lists.isocpp.org/sg16/2021/06/2429.php
>>
>> At the last telecon, we discussed addressing LWG 3565 as the
>> first agenda item for this telecon. However, I would prefer to
>> finish polling for P2093R6 first as I expect some of the
>> remaining candidate polls to be potentially relevant for the LWG
>> issue resolution.
>>
>> For reference, here are the P2093R6 polls and poll results taken
>> during the last telecon (I'll get the meeting summary posted
>> soon). Consensus so far appears to be rather strong with the
>> exception of poll 3.2.
>>
>> * *Poll 1: P2093R6: <format> and <print> facilities should have
>> consistent behavior with respect to encoding expectations for
>> the format string.*
>> Attendees: 8
>> No objection to unanimous consent.
>> * *Poll 2.1: P2093R6: <format> and <print> facilities should
>> have consistent behavior with respect to encoding
>> expectations for the output of formatters.*
>> <Not polled; per discussion, revisit following later polls>
>> * *Poll 2.2: P2093R6: formatters should not be sensitive to
>> whether they are being used with a <format> or <print> facility.*
>> Attendees: 8
>> No objection to unanimous consent.
>> * *Poll 3.1: P2093R6: Regardless of format string encoding
>> assumptions, <format> facilities may be used to format binary
>> data.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 5 1 1 0 0
>> Strong consensus
>> * *Poll 3.2: P2093R6: Regardless of format string encoding
>> assumptions, <print> facilities may be used to format binary
>> data.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 2 1 3 1 0
>> Weak consensus
>> * *Poll 4: P2093R6: <print> facilities exhibit undefined
>> behavior when an encoding expectation is present and a format
>> string or formatter output does not match those expectations.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 2 4 0 0 1
>> Strong consensus
>> * *Poll 5: P2093R6: <print> facilities exhibit undefined
>> behavior when an encoding expectation is present and a format
>> string or formatter output does not match those expectations
>> and output is directed to a device that has encoding
>> expectations.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 6 0 1 0 0
>> Stronger consensus than poll 4.
>> * *Poll 6: P2093R6: <print> facility implementors are
>> encouraged to provide a run-time means for diagnosing format
>> strings and formatter output that is not well-formed
>> according to the expected encoding.*
>> Attendees: 8 (1 abstention)
>> SF F N A SA
>> 4 0 2 1 0
>> Consensus.
>>
>> The remaining candidate polls are:
>>
>> * Poll 2.1: P2093R6: <format> and <print> facilities should
>> have consistent behavior with respect to encoding
>> expectations for the output of formatters.
>> * Poll 7: P2093R6: <print> facility implementors are encouraged
>> to substitute U+FFFD replacement characters following Unicode
>> guidance when output is directed to a device and transcoding
>> is necessary.
>> * Poll 8: P2093R6: Neither <format> nor <print> facilities
>> require an explicit program-controlled error handling
>> mechanism for violations of encoding expectations.
>> * Poll 9: P2093R6: Use of UTF-8 as the literal encoding is
>> sufficient for <format> and <print> facilities to assume that
>> the format string and output of all formatters is UTF-8 encoded.
>> * Poll 10: P2093R6: Use of a literal encoding other than UTF-8
>> is sufficient for <format> and <print> facilities to assume a
>> particular encoding for the format string and output of
>> formatters.
>> * Poll 11: P2093R6: Support for implicit encoding conversions
>> will only be possible when an encoding assumption is
>> implicitly or explicitly present.
>>
>> Assuming good consensus on those polls, we'll poll forwarding
>> P2093R6 to LEWG again with direction to revise the paper to align
>> with SG16 feedback. At a minimum, a revision will be expected to
>> record SG16 direction and rationale. In order to avoid spending
>> more SG16 telecon time on this paper, we'll look for a volunteer
>> to review the updated revision and report back to SG16.
>>
>> * Poll X: P02093R6: Direct Victor to revise the paper to
>> reflect SG16 rationale and guidance, delegate review of a
>> future revision to XXX, and forward to LEWG for inclusion in
>> C++23 pending review confirmation.
>>
>> Tom.
>>
>>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
Received on 2021-06-22 17:06:15