C++ Logo

sg16

Advanced search

Re: [SG16] SG16 meeting summary for December 15th, 2021

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 17 Dec 2021 18:18:02 -0500
Barry (and other contributors), here is a summary of the SG16 requests
for the next revision of P2286 (Formatting Ranges) as reflected in the
linked meeting summary.

 1. Add a deleted formatter specialization for std::filesystem::path so
    that objects of that type are not formatted as generic ranges.
 2. Update the escape behavior to address the following (add examples
    where helpful).
     1. (Lone) Surrogate code points (keep in mind that \uXXXX where
        XXXX designates a surrogate code point is not valid for a
        /universal-character-name/).
     2. Unassigned code points.
     3. Private Use Area (PUA) code points)
 3. Perhaps specify the escape behavior in Unicode terms (as if by
    conversion to Unicode for non-Unicode encodings as Corentin suggested).
 4. Add discussion regarding what constitutes a printable character and
    the choice to base such determination on the Z (Separator) and C
    (Other) values of the Unicode Character Database (UCD)
    General_Category property. Please provide a principled defense for
    why characters are designated as non-printable; particularly for
    spacing characters such as tab and newline.
 5. Address stability of printable character determination (do the
    referenced UCD properties provide stability guarantees?)
 6. Compare and contrast how similar features in other languages
    determine what constitutes a printable character.
 7. Consider which separator characters should be considered printable
    (U+0020 (SPACE) presumably should be, but what about U+00A0
    (NO-BREAK SPACE), U+2003 (EM SPACE), etc...).
 8. Specify how invalid code unit sequences are to be handled. This
    includes specifying, at least for self synchronizing encodings like
    UTF-8, UTF-16, and UTF-32, how such sequences are delimited.
    References to the Unicode standard (as indicated in the editor notes
    in the linked meeting summary) and/or WhatWG Encoding Standard are
    advised. This also includes specifying how wide strings are handled;
    presumably each wchar_t value in an ill-formed code unit sequence
    would be formatted as a single hex escape.
 9. Address non-Unicode encodings.

For anyone following along, the SG16 approval of P2286 is for the
general design direction as that is the best we can do until wording is
provided. The above requests probe design decisions that will need to be
resolved ASAP if the proposal is to be ready for adoption in C++23. Some
of these can be addressed post feature freeze, but that carries risks
that are best avoided for numerous reasons.

Tom.

On 12/17/21 3:46 PM, Tom Honermann via SG16 wrote:
> The summary for the SG16 meeting held December 15th, 2021 is now
> available. For those that attended, please review and suggest
> corrections:
>
> * https://github.com/sg16-unicode/sg16-meetings#december-15th-2021
> <https://github.com/sg16-unicode/sg16-meetings#december-15th-2021>
>
> Three decisions were made at this meeting:
>
> 1. We agreed to forward P2361R4 (Unevaluated strings)
> <https://wg21.link/p2361r4> to EWG.
> 2. We agreed to forward P1854R2 (Conversion to literal encoding
> should not lead to loss of meaning) <https://wg21.link/p1854r2> to
> EWG.
> 3. We agreed to forward (a draft of) P2286R4 (Formatting Ranges)
> <https://wg21.link/p2286r4> to LEWG.
>
> Per our operating procedures
> <https://github.com/sg16-unicode/sg16/blob/master/OperatingProcedures.md>,
> these decisions will be deemed as having SG16 consensus subject to
> dissenting opinions raised during the next week.
>
> Tom.
>


Received on 2021-12-17 17:18:05