sg16: Re: [SG16] Agenda for the 2021-12-01 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 1 Dec 2021 14:07:35 -0500

On 12/1/21 12:46 PM, Victor Zverovich wrote:
> Some details about the escaping mechanism that has been implemented
> for P2286R3: Formatting Ranges in the fmt library. It basically
> follows the Rust's Unicode escaping
> (https://doc.rust-lang.org/src/core/char/methods.rs.html#399-420
> <https://doc.rust-lang.org/src/core/char/methods.rs.html#399-420>)
> with the following differences:
>
> * Grapheme extended code points are not escaped although I don't see
> any problems with implementing this as well.
> * The escape format uses C++ Unicode escape sequences rather than Rust's.
> * Invalid code unit sequences are escaped for round trip (Rust doesn't
> need to do this because it has validation).
>
> So the escaping logic for code points that need escaping is as follows:
>
> \t, \r, \n, \, ' and " are escaped using their usual escape sequences
> such as literal text \t (not tab)
> Code points in the range [0, 0x100) are escaped as \xhh, e.g. \x7f
> Code points in the range [0x100, 0x10000] are escaped as \uhhhh,
> e.g. \u0378
> Code points in the range [0x10000, 0x110000] are escaped as
> \Uhhhhhhhh, e.g. \U0002a6de
> For invalid code unit sequences each individual code unit is
> escaped as \xhh (this doesn't happen in Rust or Python because those
> have validation but using hex escapes is a pretty natural extension)

What is the motivation for using hex escapes for code points 0 through
0x100 instead of UCN notation? UCN notation would retain the semantic
intent through transcoding. I like the use of hex escapes for preserving
the code units in invalid code unit sequences.

Tom.

>
> This is obviously for the Unicode case. The non-Unicode case should be
> specified separately using our usual implementation-defined wording.
>
> Cheers,
> Victor
>
>
> On Wed, Dec 1, 2021 at 7:26 AM Victor Zverovich
> <victor.zverovich_at_[hidden] <mailto:victor.zverovich_at_[hidden]>> wrote:
>
> > Is there value in specifying it?
>
> Yes, I think there is value in handling this case consistently
> across implementations. Option 1 is problematic for width = 2 and
> not just with alignment to the end so it looks like 3 is the most
> sensible one (option 2 is clearly wrong and just there for
> completeness).
>
> Cheers,
> Victor
>
>
>
> On Tue, Nov 30, 2021 at 10:13 AM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> This is your friendly reminder that this telecon will take
> place tomorrow.
>
> If we manage to get through both LWG3639 and P2286R3, we'll
> resume discussion of P2361R4 (Unevaluated strings) and
> potentially poll forwarding it.
>
> Tom.
>
> On 11/27/21 7:30 PM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, December 1st at 19:30
>> UTC (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20211201T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
>>
>> The agenda is:
>>
>> * LWG3639: Handling of fill character width is
>> underspecified in std::format <https://wg21.link/lwg3639>
>> * P2286R3: Formatting Ranges <https://wg21.link/p2286r3>
>>
>> For LWG3639 <https://wg21.link/lwg3639>, the issue discussion
>> enumerates three possible solutions, though others are
>> possible. There is also a related wording omission; table
>> [tab.format.align] <http://eel.is/c++draft/tab:format.align>
>> doesn't actually specify how alignment is achieved for the
>> '<' and '>' options (the wording doesn't state to insert fill
>> characters as it does for the '^' option). Additionally,
>> further tweaking (and possibly new LWG issues) may be needed
>> to address these concerns:
>>
>> 1. If the width of the value exceeds the field width, then
>> alignment to the end of the available space is not
>> possible. [format.string.std]p8
>> <http://eel.is/c++draft/format.string.std#8> should be
>> updated to address this possibility, presumably by noting
>> that the content may overflow the available space
>> resulting in misalignment.
>> std::format("{:X>1}}, 9999);
>> 2. If the estimated width of the fill character is greater
>> than 1, then alignment to the end of the available space
>> might not be possible. The choice here is whether to
>> under-fill or over-fill the available space. This
>> possibility is avoided if fill characters are restricted
>> to characters with an estimated width of exactly 1.
>> std::format("{:🤡>4}", 123);
>>
>> For P2286R3 <https://wg21.link/p2286r3>, LEWG requested
>> <https://lists.isocpp.org/sg16/2021/11/2845.php> that SG16
>> consider the ramifications for support of user defined
>> delimiters. We should also discuss the "?" specifier proposed
>> to explicitly opt in to quoted and escaped formats for
>> std::string, std::string_view, and arrays of char/wchar_t.
>>
>> Tom.
>>
>>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
>

Received on 2021-12-01 13:07:38