Date: Wed, 1 Dec 2021 09:46:29 -0800
Some details about the escaping mechanism that has been implemented
for P2286R3: Formatting Ranges in the fmt library. It basically follows the
Rust's Unicode escaping (
https://doc.rust-lang.org/src/core/char/methods.rs.html#399-420) with the
following differences:
* Grapheme extended code points are not escaped although I don't see any
problems with implementing this as well.
* The escape format uses C++ Unicode escape sequences rather than Rust's.
* Invalid code unit sequences are escaped for round trip (Rust doesn't need
to do this because it has validation).
So the escaping logic for code points that need escaping is as follows:
\t, \r, \n, \, ' and " are escaped using their usual escape sequences such
as literal text \t (not tab)
Code points in the range [0, 0x100) are escaped as \xhh, e.g. \x7f
Code points in the range [0x100, 0x10000] are escaped as \uhhhh, e.g. \u0378
Code points in the range [0x10000, 0x110000] are escaped as \Uhhhhhhhh,
e.g. \U0002a6de
For invalid code unit sequences each individual code unit is escaped as
\xhh (this doesn't happen in Rust or Python because those have validation
but using hex escapes is a pretty natural extension)
This is obviously for the Unicode case. The non-Unicode case should be
specified separately using our usual implementation-defined wording.
Cheers,
Victor
On Wed, Dec 1, 2021 at 7:26 AM Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:
> > Is there value in specifying it?
>
> Yes, I think there is value in handling this case consistently across
> implementations. Option 1 is problematic for width = 2 and not just with
> alignment to the end so it looks like 3 is the most sensible one (option 2
> is clearly wrong and just there for completeness).
>
> Cheers,
> Victor
>
>
>
> On Tue, Nov 30, 2021 at 10:13 AM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> This is your friendly reminder that this telecon will take place tomorrow.
>>
>> If we manage to get through both LWG3639 and P2286R3, we'll resume
>> discussion of P2361R4 (Unevaluated strings) and potentially poll forwarding
>> it.
>>
>> Tom.
>>
>> On 11/27/21 7:30 PM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, December 1st at 19:30 UTC (timezone
>> conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20211201T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>
>> ).
>>
>> The agenda is:
>>
>> - LWG3639: Handling of fill character width is underspecified in
>> std::format <https://wg21.link/lwg3639>
>> - P2286R3: Formatting Ranges <https://wg21.link/p2286r3>
>>
>> For LWG3639 <https://wg21.link/lwg3639>, the issue discussion enumerates
>> three possible solutions, though others are possible. There is also a
>> related wording omission; table [tab.format.align]
>> <http://eel.is/c++draft/tab:format.align> doesn't actually specify how
>> alignment is achieved for the '<' and '>' options (the wording doesn't
>> state to insert fill characters as it does for the '^' option).
>> Additionally, further tweaking (and possibly new LWG issues) may be needed
>> to address these concerns:
>>
>> 1. If the width of the value exceeds the field width, then alignment
>> to the end of the available space is not possible.
>> [format.string.std]p8 <http://eel.is/c++draft/format.string.std#8>
>> should be updated to address this possibility, presumably by noting that
>> the content may overflow the available space resulting in misalignment.
>> std::format("{:X>1}}, 9999);
>> 2. If the estimated width of the fill character is greater than 1,
>> then alignment to the end of the available space might not be possible. The
>> choice here is whether to under-fill or over-fill the available space. This
>> possibility is avoided if fill characters are restricted to characters with
>> an estimated width of exactly 1.
>> std::format("{:🤡>4}", 123);
>>
>> For P2286R3 <https://wg21.link/p2286r3>, LEWG requested
>> <https://lists.isocpp.org/sg16/2021/11/2845.php> that SG16 consider the
>> ramifications for support of user defined delimiters. We should also
>> discuss the "?" specifier proposed to explicitly opt in to quoted and
>> escaped formats for std::string, std::string_view, and arrays of char/
>> wchar_t.
>>
>> Tom.
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>
for P2286R3: Formatting Ranges in the fmt library. It basically follows the
Rust's Unicode escaping (
https://doc.rust-lang.org/src/core/char/methods.rs.html#399-420) with the
following differences:
* Grapheme extended code points are not escaped although I don't see any
problems with implementing this as well.
* The escape format uses C++ Unicode escape sequences rather than Rust's.
* Invalid code unit sequences are escaped for round trip (Rust doesn't need
to do this because it has validation).
So the escaping logic for code points that need escaping is as follows:
\t, \r, \n, \, ' and " are escaped using their usual escape sequences such
as literal text \t (not tab)
Code points in the range [0, 0x100) are escaped as \xhh, e.g. \x7f
Code points in the range [0x100, 0x10000] are escaped as \uhhhh, e.g. \u0378
Code points in the range [0x10000, 0x110000] are escaped as \Uhhhhhhhh,
e.g. \U0002a6de
For invalid code unit sequences each individual code unit is escaped as
\xhh (this doesn't happen in Rust or Python because those have validation
but using hex escapes is a pretty natural extension)
This is obviously for the Unicode case. The non-Unicode case should be
specified separately using our usual implementation-defined wording.
Cheers,
Victor
On Wed, Dec 1, 2021 at 7:26 AM Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:
> > Is there value in specifying it?
>
> Yes, I think there is value in handling this case consistently across
> implementations. Option 1 is problematic for width = 2 and not just with
> alignment to the end so it looks like 3 is the most sensible one (option 2
> is clearly wrong and just there for completeness).
>
> Cheers,
> Victor
>
>
>
> On Tue, Nov 30, 2021 at 10:13 AM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> This is your friendly reminder that this telecon will take place tomorrow.
>>
>> If we manage to get through both LWG3639 and P2286R3, we'll resume
>> discussion of P2361R4 (Unevaluated strings) and potentially poll forwarding
>> it.
>>
>> Tom.
>>
>> On 11/27/21 7:30 PM, Tom Honermann via SG16 wrote:
>>
>> SG16 will hold a telecon on Wednesday, December 1st at 19:30 UTC (timezone
>> conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20211201T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>
>> ).
>>
>> The agenda is:
>>
>> - LWG3639: Handling of fill character width is underspecified in
>> std::format <https://wg21.link/lwg3639>
>> - P2286R3: Formatting Ranges <https://wg21.link/p2286r3>
>>
>> For LWG3639 <https://wg21.link/lwg3639>, the issue discussion enumerates
>> three possible solutions, though others are possible. There is also a
>> related wording omission; table [tab.format.align]
>> <http://eel.is/c++draft/tab:format.align> doesn't actually specify how
>> alignment is achieved for the '<' and '>' options (the wording doesn't
>> state to insert fill characters as it does for the '^' option).
>> Additionally, further tweaking (and possibly new LWG issues) may be needed
>> to address these concerns:
>>
>> 1. If the width of the value exceeds the field width, then alignment
>> to the end of the available space is not possible.
>> [format.string.std]p8 <http://eel.is/c++draft/format.string.std#8>
>> should be updated to address this possibility, presumably by noting that
>> the content may overflow the available space resulting in misalignment.
>> std::format("{:X>1}}, 9999);
>> 2. If the estimated width of the fill character is greater than 1,
>> then alignment to the end of the available space might not be possible. The
>> choice here is whether to under-fill or over-fill the available space. This
>> possibility is avoided if fill characters are restricted to characters with
>> an estimated width of exactly 1.
>> std::format("{:🤡>4}", 123);
>>
>> For P2286R3 <https://wg21.link/p2286r3>, LEWG requested
>> <https://lists.isocpp.org/sg16/2021/11/2845.php> that SG16 consider the
>> ramifications for support of user defined delimiters. We should also
>> discuss the "?" specifier proposed to explicitly opt in to quoted and
>> escaped formats for std::string, std::string_view, and arrays of char/
>> wchar_t.
>>
>> Tom.
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>
Received on 2021-12-01 11:46:43