Date: Tue, 10 Oct 2023 16:05:05 -0400
This is your friendly reminder that this meeting is taking place tomorrow.
Tom.
On 10/7/23 11:20 PM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, October 11th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20231011T193000&p1=1440&p2=tz_pt&p3=tz_mt&p4=tz_ct&p5=tz_et&p6=tz_cest>).
>
> The agenda follows.
>
> * P1729R2: Text Parsing <https://wg21.link/p1729r2>:
> o Continue review.
>
> We made good progress reviewing this paper during the 2023-09-27
> meeting
> <https://github.com/sg16-unicode/sg16-meetings/tree/master#september-27th-2023>
> and I expect we'll complete review in this meeting. Since wording is
> not yet available, we won't poll forwarding this paper, but may poll
> support for the paper and approval of the design as encouragement for
> LEWG to review the design before a large investment is made in
> wording. We will, of course, review again once wording is available.
>
> One item I would like to discuss is that the proposed functionality
> allows for a single code unit to be scanned and produced as a char (or
> wchar_t) value. What does that imply for the following example (assume
> that the ordinary literal encoding is UTF-8)?
>
> // U+12345 is 0xF0 0x92 0x8D 0x85 in UTF-8
> std::scan<char, std::string>("\u{12345}", "{}{}");
>
> The scan of the char value presumably consumes the 0xF0 code unit such
> that the scan of the std::string value then begins scanning at the
> 0x92 trailing code unit which presumably results in a scan error,
> substitution of a replacement character in the scanned string, or a
> sequence of ill-formed code units scanned into the string, all of
> which seem undesirable; particularly if the programmer's expectation
> was that a member of the basic character set would be scanned for the
> char value. There are at least a couple of alternative options that we
> can consider:
>
> 1. Scan of a single char or wchar_t value is only considered
> successful if the value read corresponds to a character that is
> encoded as a single code unit. E.g., scan or a leading or
> surrogate code unit is an error.
> 2. Scan of a single char or wchar_t value consumes the full code unit
> sequence for the encoded character, the first code unit is used as
> the scanned value, and the remaining code units are discarded.
>
> I reviewed the current list of papers with an SG16 label
> <https://github.com/cplusplus/papers/issues?q=is%3Aissue+is%3Aopen+label%3Asg16+-label%3Aneeds-revision>,
> but am refraining from adding any more to the agenda for this meeting.
> The timing looks right for revisiting the following papers for the
> following meeting assuming author availability.
>
> * P2749: Down with "character" <https://wg21.link/p2749>
> * P2626: charN_t incremental adoption: Casting pointers of UTF
> character types <https://wg21.link/p2626>
>
> Tom.
>
>
Tom.
On 10/7/23 11:20 PM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, October 11th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20231011T193000&p1=1440&p2=tz_pt&p3=tz_mt&p4=tz_ct&p5=tz_et&p6=tz_cest>).
>
> The agenda follows.
>
> * P1729R2: Text Parsing <https://wg21.link/p1729r2>:
> o Continue review.
>
> We made good progress reviewing this paper during the 2023-09-27
> meeting
> <https://github.com/sg16-unicode/sg16-meetings/tree/master#september-27th-2023>
> and I expect we'll complete review in this meeting. Since wording is
> not yet available, we won't poll forwarding this paper, but may poll
> support for the paper and approval of the design as encouragement for
> LEWG to review the design before a large investment is made in
> wording. We will, of course, review again once wording is available.
>
> One item I would like to discuss is that the proposed functionality
> allows for a single code unit to be scanned and produced as a char (or
> wchar_t) value. What does that imply for the following example (assume
> that the ordinary literal encoding is UTF-8)?
>
> // U+12345 is 0xF0 0x92 0x8D 0x85 in UTF-8
> std::scan<char, std::string>("\u{12345}", "{}{}");
>
> The scan of the char value presumably consumes the 0xF0 code unit such
> that the scan of the std::string value then begins scanning at the
> 0x92 trailing code unit which presumably results in a scan error,
> substitution of a replacement character in the scanned string, or a
> sequence of ill-formed code units scanned into the string, all of
> which seem undesirable; particularly if the programmer's expectation
> was that a member of the basic character set would be scanned for the
> char value. There are at least a couple of alternative options that we
> can consider:
>
> 1. Scan of a single char or wchar_t value is only considered
> successful if the value read corresponds to a character that is
> encoded as a single code unit. E.g., scan or a leading or
> surrogate code unit is an error.
> 2. Scan of a single char or wchar_t value consumes the full code unit
> sequence for the encoded character, the first code unit is used as
> the scanned value, and the remaining code units are discarded.
>
> I reviewed the current list of papers with an SG16 label
> <https://github.com/cplusplus/papers/issues?q=is%3Aissue+is%3Aopen+label%3Asg16+-label%3Aneeds-revision>,
> but am refraining from adding any more to the agenda for this meeting.
> The timing looks right for revisiting the following papers for the
> following meeting assuming author availability.
>
> * P2749: Down with "character" <https://wg21.link/p2749>
> * P2626: charN_t incremental adoption: Casting pointers of UTF
> character types <https://wg21.link/p2626>
>
> Tom.
>
>
Received on 2023-10-10 20:05:07