sg16: Re: [SG16] Agenda for the 2021-12-01 SG16 telecon

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Wed, 1 Dec 2021 11:21:21 -0800

> rust doesn't escape printable ascii, this is the intent here too, right?

Yes except for \t, \r, \n, \, ' and " which have to be escaped. This
applies to all printable code points, not just ASCII.

On Wed, Dec 1, 2021 at 11:13 AM Corentin Jabot <corentinjabot_at_[hidden]>
wrote:

>
>
> On Wed, Dec 1, 2021 at 8:07 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> On 12/1/21 12:46 PM, Victor Zverovich wrote:
>>
>> Some details about the escaping mechanism that has been implemented
>> for P2286R3: Formatting Ranges in the fmt library. It basically follows the
>> Rust's Unicode escaping (
>> https://doc.rust-lang.org/src/core/char/methods.rs.html#399-420) with
>> the following differences:
>>
>> * Grapheme extended code points are not escaped although I don't see any
>> problems with implementing this as well.
>> * The escape format uses C++ Unicode escape sequences rather than Rust's.
>> * Invalid code unit sequences are escaped for round trip (Rust doesn't
>> need to do this because it has validation).
>>
>> So the escaping logic for code points that need escaping is as follows:
>>
>> \t, \r, \n, \, ' and " are escaped using their usual escape sequences
>> such as literal text \t (not tab)
>> Code points in the range [0, 0x100) are escaped as \xhh, e.g. \x7f
>> Code points in the range [0x100, 0x10000] are escaped as \uhhhh,
>> e.g. \u0378
>> Code points in the range [0x10000, 0x110000] are escaped as \Uhhhhhhhh,
>> e.g. \U0002a6de
>> For invalid code unit sequences each individual code unit is escaped as
>> \xhh (this doesn't happen in Rust or Python because those have validation
>> but using hex escapes is a pretty natural extension)
>>
>> What is the motivation for using hex escapes for code points 0 through
>> 0x100 instead of UCN notation? UCN notation would retain the semantic
>> intent through transcoding. I like the use of hex escapes for preserving
>> the code units in invalid code unit sequences.
>>
> Also, to clarify... rust doesn't escape printable ascii, this is the
> intent here too, right?
>
>
>> Tom.
>>
>>
>> This is obviously for the Unicode case. The non-Unicode case should be
>> specified separately using our usual implementation-defined wording.
>>
>> Cheers,
>> Victor
>>
>>
>> On Wed, Dec 1, 2021 at 7:26 AM Victor Zverovich <
>> victor.zverovich_at_[hidden]> wrote:
>>
>>> > Is there value in specifying it?
>>>
>>> Yes, I think there is value in handling this case consistently across
>>> implementations. Option 1 is problematic for width = 2 and not just with
>>> alignment to the end so it looks like 3 is the most sensible one (option 2
>>> is clearly wrong and just there for completeness).
>>>
>>> Cheers,
>>> Victor
>>>
>>>
>>>
>>> On Tue, Nov 30, 2021 at 10:13 AM Tom Honermann via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> This is your friendly reminder that this telecon will take place
>>>> tomorrow.
>>>>
>>>> If we manage to get through both LWG3639 and P2286R3, we'll resume
>>>> discussion of P2361R4 (Unevaluated strings) and potentially poll forwarding
>>>> it.
>>>>
>>>> Tom.
>>>>
>>>> On 11/27/21 7:30 PM, Tom Honermann via SG16 wrote:
>>>>
>>>> SG16 will hold a telecon on Wednesday, December 1st at 19:30 UTC (timezone
>>>> conversion
>>>> <https://www.timeanddate.com/worldclock/converter.html?iso=20211201T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>
>>>> ).
>>>>
>>>> The agenda is:
>>>>
>>>> - LWG3639: Handling of fill character width is underspecified in
>>>> std::format <https://wg21.link/lwg3639>
>>>> - P2286R3: Formatting Ranges <https://wg21.link/p2286r3>
>>>>
>>>> For LWG3639 <https://wg21.link/lwg3639>, the issue discussion
>>>> enumerates three possible solutions, though others are possible. There is
>>>> also a related wording omission; table [tab.format.align]
>>>> <http://eel.is/c++draft/tab:format.align> doesn't actually specify how
>>>> alignment is achieved for the '<' and '>' options (the wording doesn't
>>>> state to insert fill characters as it does for the '^' option).
>>>> Additionally, further tweaking (and possibly new LWG issues) may be needed
>>>> to address these concerns:
>>>>
>>>> 1. If the width of the value exceeds the field width, then
>>>> alignment to the end of the available space is not possible.
>>>> [format.string.std]p8 <http://eel.is/c++draft/format.string.std#8>
>>>> should be updated to address this possibility, presumably by noting that
>>>> the content may overflow the available space resulting in misalignment.
>>>> std::format("{:X>1}}, 9999);
>>>> 2. If the estimated width of the fill character is greater than 1,
>>>> then alignment to the end of the available space might not be possible. The
>>>> choice here is whether to under-fill or over-fill the available space. This
>>>> possibility is avoided if fill characters are restricted to characters with
>>>> an estimated width of exactly 1.
>>>> std::format("{:🤡>4}", 123);
>>>>
>>>> For P2286R3 <https://wg21.link/p2286r3>, LEWG requested
>>>> <https://lists.isocpp.org/sg16/2021/11/2845.php> that SG16 consider
>>>> the ramifications for support of user defined delimiters. We should also
>>>> discuss the "?" specifier proposed to explicitly opt in to quoted and
>>>> escaped formats for std::string, std::string_view, and arrays of char/
>>>> wchar_t.
>>>>
>>>> Tom.
>>>>
>>>>
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>
>>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2021-12-01 13:21:37