Date: Wed, 23 Aug 2023 16:52:04 +0000
Hi Tom,
I think the list of error-handling use-cases would all be valuable additions to the paper. I’d also really like to see an example of using the UTF transcoding facilities for UTF-8 validation to report a list of places in the input in which ill-formed UTF-8 occurs.
Best regards,
Peter
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: 23 August 2023 17:07
To: sg16_at_[hidden]
Cc: Tom Honermann <tom_at_[hidden]>
Subject: Re: [SG16] Agenda for the 2023-08-23 SG16 telecon
EXTERNAL MAIL
On 8/23/23 11:40 AM, Tom Honermann via SG16 wrote:
SG16 will hold a telecon on Wednesday, August 23rd, at 19:30 UTC (timezone conversion<https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20230823T193000&p1=1440&p2=tz_pt&p3=tz_mt&p4=tz_ct&p5=tz_et&p6=tz_cest__;!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50XhVD9y0$>).
That is today, in about 4 hours. Obviously, I'm still struggling to keep up with all the things.
The agenda follows.
* P2909R0: Dude, where’s my char?<https://urldefense.com/v3/__https:/wg21.link/p2909r0__;!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50TqdWaNk$>
* P2728R6: Unicode in the Library, Part 1: UTF Transcoding<https://urldefense.com/v3/__https:/wg21.link/p2728r6__;!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50PvIKhkA$>
P2902R0 is a new paper from Victor that seeks to establish portable behavior for formatting of objects of type char regardless of the implementation-defined signedness of char.
P2728 was last discussed during the 2023-05-10 SG16 telecon<https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16-meetings*may-10th-2023__;Iw!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50fAoDW9A$>. I have continued to hear feedback that the motivation for the proposal, as presented in the paper, is lacking. I'd like to focus on giving Zach specific, concrete, and clear direction regarding how to improve the paper in this respect. Comments should focus on what is perceived to be missing and what changes would fill those gaps. Following that, I would like to focus on error handling. The proposal includes a transcoding_error_handler concept and a use_replacement_character class that models that concept and that provides the default error handling behavior. The error handler is specified to take a message passed as an object of type std::string_view, but does not specify the contents of the message. The error handler is constrained to return a single value of type char32_t and is not given access to the source text (except. possibly via the message, which is always char-based). Is this sufficient? If not, what changes are needed?
Zach, with regard to error handling, I would like for the paper to present examples of error handlers that implement each of the following:
* The policies from Unicode PR-121<https://urldefense.com/v3/__http:/unicode.org/review/pr-121.html__;!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50US6J_z4$>:
* Replace the entire ill-formed subsequence by a single U+FFFD.
* Replace each maximal subpart of the ill-formed subsequence by a single U+FFFD.
* Replace each code unit of the ill-formed subsequence by a single U+FFFD.
* Remove each ill-formed subsequence (without substituting any replacement characters).
* Replace each ill-formed subsequence with an escaped representation of the values of the invalid code units.
(I believe only one of these is possible with the proposal as it currently stands.)
Tom.
Tom.
I think the list of error-handling use-cases would all be valuable additions to the paper. I’d also really like to see an example of using the UTF transcoding facilities for UTF-8 validation to report a list of places in the input in which ill-formed UTF-8 occurs.
Best regards,
Peter
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: 23 August 2023 17:07
To: sg16_at_[hidden]
Cc: Tom Honermann <tom_at_[hidden]>
Subject: Re: [SG16] Agenda for the 2023-08-23 SG16 telecon
EXTERNAL MAIL
On 8/23/23 11:40 AM, Tom Honermann via SG16 wrote:
SG16 will hold a telecon on Wednesday, August 23rd, at 19:30 UTC (timezone conversion<https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20230823T193000&p1=1440&p2=tz_pt&p3=tz_mt&p4=tz_ct&p5=tz_et&p6=tz_cest__;!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50XhVD9y0$>).
That is today, in about 4 hours. Obviously, I'm still struggling to keep up with all the things.
The agenda follows.
* P2909R0: Dude, where’s my char?<https://urldefense.com/v3/__https:/wg21.link/p2909r0__;!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50TqdWaNk$>
* P2728R6: Unicode in the Library, Part 1: UTF Transcoding<https://urldefense.com/v3/__https:/wg21.link/p2728r6__;!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50PvIKhkA$>
P2902R0 is a new paper from Victor that seeks to establish portable behavior for formatting of objects of type char regardless of the implementation-defined signedness of char.
P2728 was last discussed during the 2023-05-10 SG16 telecon<https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16-meetings*may-10th-2023__;Iw!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50fAoDW9A$>. I have continued to hear feedback that the motivation for the proposal, as presented in the paper, is lacking. I'd like to focus on giving Zach specific, concrete, and clear direction regarding how to improve the paper in this respect. Comments should focus on what is perceived to be missing and what changes would fill those gaps. Following that, I would like to focus on error handling. The proposal includes a transcoding_error_handler concept and a use_replacement_character class that models that concept and that provides the default error handling behavior. The error handler is specified to take a message passed as an object of type std::string_view, but does not specify the contents of the message. The error handler is constrained to return a single value of type char32_t and is not given access to the source text (except. possibly via the message, which is always char-based). Is this sufficient? If not, what changes are needed?
Zach, with regard to error handling, I would like for the paper to present examples of error handlers that implement each of the following:
* The policies from Unicode PR-121<https://urldefense.com/v3/__http:/unicode.org/review/pr-121.html__;!!EHscmS1ygiU1lA!CaI8CYJowXu_qWN9mYlNIPtEFLKMjOy48RhXIvuj1srPCBldXCYHjsy8JB7In158585PJw50US6J_z4$>:
* Replace the entire ill-formed subsequence by a single U+FFFD.
* Replace each maximal subpart of the ill-formed subsequence by a single U+FFFD.
* Replace each code unit of the ill-formed subsequence by a single U+FFFD.
* Remove each ill-formed subsequence (without substituting any replacement characters).
* Replace each ill-formed subsequence with an escaped representation of the values of the invalid code units.
(I believe only one of these is possible with the proposal as it currently stands.)
Tom.
Tom.
Received on 2023-08-23 16:52:12