sg16: Re: [SG16] Agenda for the 2022-01-12 SG16 telecon

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 12 Jan 2022 10:42:03 +0100

On Tue, Jan 11, 2022 at 7:41 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> This is your friendly reminder that this telecon will take place tomorrow.
>
> Barry informed me that a D2286R5 draft is not yet available, so we will
> not be discussing that paper tomorrow.
>
> Tom.
>
> On 1/10/22 12:19 PM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, January 12th at 19:30 UTC (timezone
> conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20220112T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>
> ).
>
> The agenda is:
>
> - D2286R5: Formatting Ranges
> - Review pending availability of proposed wording and continued
> targeting of C++23.
> - P2491R0: Text encodings follow-up <https://wg21.link/p2491r0>
> - Initial review.
> - P2498R0: Forward compatibility of text_encoding with additional
> encoding registries <https://wg21.link/p2498r0>
> - Initial review.
>
> I don't yet have confirmation of the existence of a D2286R5 so, unless I
> hear otherwise, the first item on the agenda won't happen.
>
> We last reviewed a draft of P2286R4 <https://wg21.link/p2286r4> during
> the 2021-12-15 SG16 telecon
> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#december-15th-2021>
> where we approved forwarding it to LEWG despite the absence of wording.
> Prior to that, we had reviewed P2286R3 <https://wg21.link/p2286r3> during
> the 2021-12-01 SG16 telecon
> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#december-1st-2021>.
> LEWG has since reviewed the proposal during its 2022-01-04 telecon
> <https://wiki.edg.com/bin/view/Wg21telecons2022/P2286?twiki_redirect_cache=e1d7621f93ffd926fb2c8172a10fead5>
> and plans to review again during its 2022-01-18 telecon. In this telecon,
> we'll review the available wording to look for new SG16 concerns and to
> validate the wording reflects previous design guidance. Previous design
> discussion related to the following concerns:
>
> 1. Use of P2290 <https://wg21.link/p2290> style brace delimited
> hexadecimal notation to preserve the values of code units that appear in an
> ill-formed code unit sequence.
> 2. Use of P2290 <https://wg21.link/p2290> style brace delimited UCN
> notation (as opposed to hexadecimal notation) for non-printable characters.
> 3. Whether it is always possible to map an input character to a
> Unicode character for the purpose of determining printability.
> 4. How characters are determined to be printable or non-printable.
> 5. Handling of lone surrogate characters; whether they are encoded in
> UCN notation (like a non-printable character) or in hexadecimal notation
> (like an invalid code unit).
> 6. Handling of unassigned code points.
> 7. Handling of Private Use Area (PUA) code points.
> 8. How to determine the boundaries of ill-formed code unit sequences.
> 9. Whether a replacement character should be emitted for an ill-formed
> code unit sequence (as opposed to emitting hexadecimal notation for each
> contributing code unit).
> 10. Stability guarantees.
> 11. Support for non-Unicode platforms.
> 12. Handling of std::filesystem::path.
>
> P2491R0 <https://wg21.link/p2491r0> proposes changes to P1885; primarily
> with regard to handling of wide encodings. The recently communicated
> D1885R9 <https://isocpp.org/files/papers/D1885R9.pdf> draft revision
> removes support for wide encodings thus making much (but not all) of what
> P2491R0 proposes moot.
>
> P2498R0 <https://wg21.link/p2498r0> also proposes changes to P1885 to
> make way for the possibility of supporting different encoding registrars in
> the future. Note that the ISO does specify its own registry of encodings
> that have been registered for use with ISO/IEC 2022
> <https://www.iso.org/standard/22747.html>. The registry is called ISO-IR
> (officially, "INTERNATIONAL REGISTER OF CODED CHARACTER SETS TO BE USED
> WITH ESCAPE SEQUENCES") and the registration procedures are specified in ISO/IEC
> 2375 <https://www.iso.org/standard/32184.html>. Unfortunately, I don't
> think any of these publications is freely available, though copies can be
> found online. ISO/IEC 2022 <https://www.iso.org/standard/22747.html> is
> also published as ECMA-35
> <https://www.ecma-international.org/publications-and-standards/standards/ecma-35/>
> .,
>
> For people following along, we should note that the encoding
registry described in ISO 2022 has not been updated since 1994, does not
describe a general encoding registry, but a set of encodings, or parts of
encodings for use with ISO 2022.
Most of these encodings are present in IANA or are no longer in use / fit
for general purposes. When present in iana they have an "iso-ir-xxx" that
establishes a mapping to ISO.

Please note that the registration of encodings in that database is not
meant to register an encoding, but an escape sequence to switch from/to
that encoding.

The existence of such escape sequences makes the registered encoding not
strictly conforming to the encoding the escape sequence pertains to.
For example the registry describes no less than 4 "UTF-8" entries, all
pertaining to the escape sequence and encoding use. [1]

Absent from this database are all the windows and ibm encodings.

Further observations regarding P2498 can be found in D1885R9

[1] https://www.itscj-ipsj.jp/ir/190.pdf
https://www.itscj-ipsj.jp/ir/191.pdf https://www.itscj-ipsj.jp/ir/192.pdf
https://www.itscj-ipsj.jp/ir/196.pdf

> Please review the updates made to D1885R9
> <https://isocpp.org/files/papers/D1885R9.pdf> prior to this telecon for
> applicability to our review of the latter two papers.
>
> Tom.
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2022-01-12 09:42:14