On Tue, Jan 11, 2022 at 7:41 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

This is your friendly reminder that this telecon will take place tomorrow.

Barry informed me that a D2286R5 draft is not yet available, so we will not be discussing that paper tomorrow.

Tom.

On 1/10/22 12:19 PM, Tom Honermann via SG16 wrote:

SG16 will hold a telecon on Wednesday, January 12th at 19:30 UTC (timezone conversion).

The agenda is:

D2286R5: Formatting Ranges

Review pending availability of proposed wording and continued targeting of C++23.

P2491R0: Text encodings follow-up

Initial review.

P2498R0: Forward compatibility of text_encoding with additional encoding registries

Initial review.

I don't yet have confirmation of the existence of a D2286R5 so, unless I hear otherwise, the first item on the agenda won't happen.

We last reviewed a draft of P2286R4 during the 2021-12-15 SG16 telecon where we approved forwarding it to LEWG despite the absence of wording. Prior to that, we had reviewed P2286R3 during the 2021-12-01 SG16 telecon. LEWG has since reviewed the proposal during its 2022-01-04 telecon and plans to review again during its 2022-01-18 telecon. In this telecon, we'll review the available wording to look for new SG16 concerns and to validate the wording reflects previous design guidance. Previous design discussion related to the following concerns:

Use of P2290 style brace delimited hexadecimal notation to preserve the values of code units that appear in an ill-formed code unit sequence.

Use of P2290 style brace delimited UCN notation (as opposed to hexadecimal notation) for non-printable characters.

Whether it is always possible to map an input character to a Unicode character for the purpose of determining printability.

How characters are determined to be printable or non-printable.

Handling of lone surrogate characters; whether they are encoded in UCN notation (like a non-printable character) or in hexadecimal notation (like an invalid code unit).

Handling of unassigned code points.

Handling of Private Use Area (PUA) code points.

How to determine the boundaries of ill-formed code unit sequences.

Whether a replacement character should be emitted for an ill-formed code unit sequence (as opposed to emitting hexadecimal notation for each contributing code unit).

Stability guarantees.

Support for non-Unicode platforms.

Handling of std::filesystem::path.

P2491R0 proposes changes to P1885; primarily with regard to handling of wide encodings. The recently communicated D1885R9 draft revision removes support for wide encodings thus making much (but not all) of what P2491R0 proposes moot.

P2498R0 also proposes changes to P1885 to make way for the possibility of supporting different encoding registrars in the future. Note that the ISO does specify its own registry of encodings that have been registered for use with ISO/IEC 2022. The registry is called ISO-IR (officially, "INTERNATIONAL REGISTER OF CODED CHARACTER SETS TO BE USED WITH ESCAPE SEQUENCES") and the registration procedures are specified in ISO/IEC 2375. Unfortunately, I don't think any of these publications is freely available, though copies can be found online. ISO/IEC 2022 is also published as ECMA-35.,

For people following along, we should note that the encoding registry described in ISO 2022 has not been updated since 1994, does not describe a general encoding registry, but a set of encodings, or parts of encodings for use with ISO 2022.

Most of these encodings are present in IANA or are no longer in use / fit for general purposes. When present in iana they have an "iso-ir-xxx" that establishes a mapping to ISO.

Please note that the registration of encodings in that database is not meant to register an encoding, but an escape sequence to switch from/to that encoding.

The existence of such escape sequences makes the registered encoding not strictly conforming to the encoding the escape sequence pertains to.

For example the registry describes no less than 4 "UTF-8" entries, all pertaining to the escape sequence and encoding use. [1]

Absent from this database are all the windows and ibm encodings.

Further observations regarding P2498 can be found in D1885R9

[1] https://www.itscj-ipsj.jp/ir/190.pdf https://www.itscj-ipsj.jp/ir/191.pdf https://www.itscj-ipsj.jp/ir/192.pdf https://www.itscj-ipsj.jp/ir/196.pdf

Please review the updates made to D1885R9 prior to this telecon for applicability to our review of the latter two papers.

Tom.

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16