Date: Mon, 10 Jan 2022 12:19:08 -0500
SG16 will hold a telecon on Wednesday, January 12th at 19:30 UTC
(timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20220112T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
The agenda is:
* D2286R5: Formatting Ranges
o Review pending availability of proposed wording and continued
targeting of C++23.
* P2491R0: Text encodings follow-up <https://wg21.link/p2491r0>
o Initial review.
* P2498R0: Forward compatibility of text_encoding with additional
encoding registries <https://wg21.link/p2498r0>
o Initial review.
I don't yet have confirmation of the existence of a D2286R5 so, unless I
hear otherwise, the first item on the agenda won't happen.
We last reviewed a draft of P2286R4 <https://wg21.link/p2286r4> during
the 2021-12-15 SG16 telecon
<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#december-15th-2021>
where we approved forwarding it to LEWG despite the absence of wording.
Prior to that, we had reviewed P2286R3 <https://wg21.link/p2286r3>
during the 2021-12-01 SG16 telecon
<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#december-1st-2021>.
LEWG has since reviewed the proposal during its 2022-01-04 telecon
<https://wiki.edg.com/bin/view/Wg21telecons2022/P2286?twiki_redirect_cache=e1d7621f93ffd926fb2c8172a10fead5>
and plans to review again during its 2022-01-18 telecon. In this
telecon, we'll review the available wording to look for new SG16
concerns and to validate the wording reflects previous design guidance.
Previous design discussion related to the following concerns:
1. Use of P2290 <https://wg21.link/p2290> style brace delimited
hexadecimal notation to preserve the values of code units that
appear in an ill-formed code unit sequence.
2. Use of P2290 <https://wg21.link/p2290> style brace delimited UCN
notation (as opposed to hexadecimal notation) for non-printable
characters.
3. Whether it is always possible to map an input character to a Unicode
character for the purpose of determining printability.
4. How characters are determined to be printable or non-printable.
5. Handling of lone surrogate characters; whether they are encoded in
UCN notation (like a non-printable character) or in hexadecimal
notation (like an invalid code unit).
6. Handling of unassigned code points.
7. Handling of Private Use Area (PUA) code points.
8. How to determine the boundaries of ill-formed code unit sequences.
9. Whether a replacement character should be emitted for an ill-formed
code unit sequence (as opposed to emitting hexadecimal notation for
each contributing code unit).
10. Stability guarantees.
11. Support for non-Unicode platforms.
12. Handling of std::filesystem::path.
P2491R0 <https://wg21.link/p2491r0> proposes changes to P1885; primarily
with regard to handling of wide encodings. The recently communicated
D1885R9 <https://isocpp.org/files/papers/D1885R9.pdf> draft revision
removes support for wide encodings thus making much (but not all) of
what P2491R0 proposes moot.
P2498R0 <https://wg21.link/p2498r0> also proposes changes to P1885 to
make way for the possibility of supporting different encoding registrars
in the future. Note that the ISO does specify its own registry of
encodings that have been registered for use with ISO/IEC 2022
<https://www.iso.org/standard/22747.html>. The registry is called ISO-IR
(officially, "INTERNATIONAL REGISTER OF CODED CHARACTER SETS TO BE USED
WITH ESCAPE SEQUENCES") and the registration procedures are specified in
ISO/IEC 2375 <https://www.iso.org/standard/32184.html>. Unfortunately, I
don't think any of these publications is freely available, though copies
can be found online. ISO/IEC 2022
<https://www.iso.org/standard/22747.html> is also published as ECMA-35
<https://www.ecma-international.org/publications-and-standards/standards/ecma-35/>.
Please review the updates made to D1885R9
<https://isocpp.org/files/papers/D1885R9.pdf> prior to this telecon for
applicability to our review of the latter two papers.
Tom.
(timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20220112T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
The agenda is:
* D2286R5: Formatting Ranges
o Review pending availability of proposed wording and continued
targeting of C++23.
* P2491R0: Text encodings follow-up <https://wg21.link/p2491r0>
o Initial review.
* P2498R0: Forward compatibility of text_encoding with additional
encoding registries <https://wg21.link/p2498r0>
o Initial review.
I don't yet have confirmation of the existence of a D2286R5 so, unless I
hear otherwise, the first item on the agenda won't happen.
We last reviewed a draft of P2286R4 <https://wg21.link/p2286r4> during
the 2021-12-15 SG16 telecon
<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#december-15th-2021>
where we approved forwarding it to LEWG despite the absence of wording.
Prior to that, we had reviewed P2286R3 <https://wg21.link/p2286r3>
during the 2021-12-01 SG16 telecon
<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#december-1st-2021>.
LEWG has since reviewed the proposal during its 2022-01-04 telecon
<https://wiki.edg.com/bin/view/Wg21telecons2022/P2286?twiki_redirect_cache=e1d7621f93ffd926fb2c8172a10fead5>
and plans to review again during its 2022-01-18 telecon. In this
telecon, we'll review the available wording to look for new SG16
concerns and to validate the wording reflects previous design guidance.
Previous design discussion related to the following concerns:
1. Use of P2290 <https://wg21.link/p2290> style brace delimited
hexadecimal notation to preserve the values of code units that
appear in an ill-formed code unit sequence.
2. Use of P2290 <https://wg21.link/p2290> style brace delimited UCN
notation (as opposed to hexadecimal notation) for non-printable
characters.
3. Whether it is always possible to map an input character to a Unicode
character for the purpose of determining printability.
4. How characters are determined to be printable or non-printable.
5. Handling of lone surrogate characters; whether they are encoded in
UCN notation (like a non-printable character) or in hexadecimal
notation (like an invalid code unit).
6. Handling of unassigned code points.
7. Handling of Private Use Area (PUA) code points.
8. How to determine the boundaries of ill-formed code unit sequences.
9. Whether a replacement character should be emitted for an ill-formed
code unit sequence (as opposed to emitting hexadecimal notation for
each contributing code unit).
10. Stability guarantees.
11. Support for non-Unicode platforms.
12. Handling of std::filesystem::path.
P2491R0 <https://wg21.link/p2491r0> proposes changes to P1885; primarily
with regard to handling of wide encodings. The recently communicated
D1885R9 <https://isocpp.org/files/papers/D1885R9.pdf> draft revision
removes support for wide encodings thus making much (but not all) of
what P2491R0 proposes moot.
P2498R0 <https://wg21.link/p2498r0> also proposes changes to P1885 to
make way for the possibility of supporting different encoding registrars
in the future. Note that the ISO does specify its own registry of
encodings that have been registered for use with ISO/IEC 2022
<https://www.iso.org/standard/22747.html>. The registry is called ISO-IR
(officially, "INTERNATIONAL REGISTER OF CODED CHARACTER SETS TO BE USED
WITH ESCAPE SEQUENCES") and the registration procedures are specified in
ISO/IEC 2375 <https://www.iso.org/standard/32184.html>. Unfortunately, I
don't think any of these publications is freely available, though copies
can be found online. ISO/IEC 2022
<https://www.iso.org/standard/22747.html> is also published as ECMA-35
<https://www.ecma-international.org/publications-and-standards/standards/ecma-35/>.
Please review the updates made to D1885R9
<https://isocpp.org/files/papers/D1885R9.pdf> prior to this telecon for
applicability to our review of the latter two papers.
Tom.
Received on 2022-01-10 17:19:10