Date: Wed, 1 Dec 2021 19:30:07 +0100
On Wed, Dec 1, 2021 at 6:46 PM Victor Zverovich via SG16 <
sg16_at_[hidden]> wrote:
> Some details about the escaping mechanism that has been implemented
> for P2286R3: Formatting Ranges in the fmt library. It basically follows the
> Rust's Unicode escaping (
> https://doc.rust-lang.org/src/core/char/methods.rs.html#399-420) with the
> following differences:
>
> * Grapheme extended code points are not escaped although I don't see any
> problems with implementing this as well.
> * The escape format uses C++ Unicode escape sequences rather than Rust's.
>
Can you explain why? I think rust syntax is more readable and matches P2290
> * Invalid code unit sequences are escaped for round trip (Rust doesn't
> need to do this because it has validation).
>
> So the escaping logic for code points that need escaping is as follows:
>
> \t, \r, \n, \, ' and " are escaped using their usual escape sequences such
> as literal text \t (not tab)
> Code points in the range [0, 0x100) are escaped as \xhh, e.g. \x7f
> Code points in the range [0x100, 0x10000] are escaped as \uhhhh,
> e.g. \u0378
> Code points in the range [0x10000, 0x110000] are escaped as \Uhhhhhhhh,
> e.g. \U0002a6de
> For invalid code unit sequences each individual code unit is escaped as
> \xhh (this doesn't happen in Rust or Python because those have validation
> but using hex escapes is a pretty natural extension)
>
> This is obviously for the Unicode case. The non-Unicode case should be
> specified separately using our usual implementation-defined wording.
>
> Cheers,
> Victor
>
>
> On Wed, Dec 1, 2021 at 7:26 AM Victor Zverovich <
> victor.zverovich_at_[hidden]> wrote:
>
>> > Is there value in specifying it?
>>
>> Yes, I think there is value in handling this case consistently across
>> implementations. Option 1 is problematic for width = 2 and not just with
>> alignment to the end so it looks like 3 is the most sensible one (option 2
>> is clearly wrong and just there for completeness).
>>
>> Cheers,
>> Victor
>>
>>
>>
>> On Tue, Nov 30, 2021 at 10:13 AM Tom Honermann via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> This is your friendly reminder that this telecon will take place
>>> tomorrow.
>>>
>>> If we manage to get through both LWG3639 and P2286R3, we'll resume
>>> discussion of P2361R4 (Unevaluated strings) and potentially poll forwarding
>>> it.
>>>
>>> Tom.
>>>
>>> On 11/27/21 7:30 PM, Tom Honermann via SG16 wrote:
>>>
>>> SG16 will hold a telecon on Wednesday, December 1st at 19:30 UTC (timezone
>>> conversion
>>> <https://www.timeanddate.com/worldclock/converter.html?iso=20211201T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>
>>> ).
>>>
>>> The agenda is:
>>>
>>> - LWG3639: Handling of fill character width is underspecified in
>>> std::format <https://wg21.link/lwg3639>
>>> - P2286R3: Formatting Ranges <https://wg21.link/p2286r3>
>>>
>>> For LWG3639 <https://wg21.link/lwg3639>, the issue discussion
>>> enumerates three possible solutions, though others are possible. There is
>>> also a related wording omission; table [tab.format.align]
>>> <http://eel.is/c++draft/tab:format.align> doesn't actually specify how
>>> alignment is achieved for the '<' and '>' options (the wording doesn't
>>> state to insert fill characters as it does for the '^' option).
>>> Additionally, further tweaking (and possibly new LWG issues) may be needed
>>> to address these concerns:
>>>
>>> 1. If the width of the value exceeds the field width, then alignment
>>> to the end of the available space is not possible.
>>> [format.string.std]p8 <http://eel.is/c++draft/format.string.std#8>
>>> should be updated to address this possibility, presumably by noting that
>>> the content may overflow the available space resulting in misalignment.
>>> std::format("{:X>1}}, 9999);
>>> 2. If the estimated width of the fill character is greater than 1,
>>> then alignment to the end of the available space might not be possible. The
>>> choice here is whether to under-fill or over-fill the available space. This
>>> possibility is avoided if fill characters are restricted to characters with
>>> an estimated width of exactly 1.
>>> std::format("{:🤡>4}", 123);
>>>
>>> For P2286R3 <https://wg21.link/p2286r3>, LEWG requested
>>> <https://lists.isocpp.org/sg16/2021/11/2845.php> that SG16 consider the
>>> ramifications for support of user defined delimiters. We should also
>>> discuss the "?" specifier proposed to explicitly opt in to quoted and
>>> escaped formats for std::string, std::string_view, and arrays of char/
>>> wchar_t.
>>>
>>> Tom.
>>>
>>>
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
sg16_at_[hidden]> wrote:
> Some details about the escaping mechanism that has been implemented
> for P2286R3: Formatting Ranges in the fmt library. It basically follows the
> Rust's Unicode escaping (
> https://doc.rust-lang.org/src/core/char/methods.rs.html#399-420) with the
> following differences:
>
> * Grapheme extended code points are not escaped although I don't see any
> problems with implementing this as well.
> * The escape format uses C++ Unicode escape sequences rather than Rust's.
>
Can you explain why? I think rust syntax is more readable and matches P2290
> * Invalid code unit sequences are escaped for round trip (Rust doesn't
> need to do this because it has validation).
>
> So the escaping logic for code points that need escaping is as follows:
>
> \t, \r, \n, \, ' and " are escaped using their usual escape sequences such
> as literal text \t (not tab)
> Code points in the range [0, 0x100) are escaped as \xhh, e.g. \x7f
> Code points in the range [0x100, 0x10000] are escaped as \uhhhh,
> e.g. \u0378
> Code points in the range [0x10000, 0x110000] are escaped as \Uhhhhhhhh,
> e.g. \U0002a6de
> For invalid code unit sequences each individual code unit is escaped as
> \xhh (this doesn't happen in Rust or Python because those have validation
> but using hex escapes is a pretty natural extension)
>
> This is obviously for the Unicode case. The non-Unicode case should be
> specified separately using our usual implementation-defined wording.
>
> Cheers,
> Victor
>
>
> On Wed, Dec 1, 2021 at 7:26 AM Victor Zverovich <
> victor.zverovich_at_[hidden]> wrote:
>
>> > Is there value in specifying it?
>>
>> Yes, I think there is value in handling this case consistently across
>> implementations. Option 1 is problematic for width = 2 and not just with
>> alignment to the end so it looks like 3 is the most sensible one (option 2
>> is clearly wrong and just there for completeness).
>>
>> Cheers,
>> Victor
>>
>>
>>
>> On Tue, Nov 30, 2021 at 10:13 AM Tom Honermann via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> This is your friendly reminder that this telecon will take place
>>> tomorrow.
>>>
>>> If we manage to get through both LWG3639 and P2286R3, we'll resume
>>> discussion of P2361R4 (Unevaluated strings) and potentially poll forwarding
>>> it.
>>>
>>> Tom.
>>>
>>> On 11/27/21 7:30 PM, Tom Honermann via SG16 wrote:
>>>
>>> SG16 will hold a telecon on Wednesday, December 1st at 19:30 UTC (timezone
>>> conversion
>>> <https://www.timeanddate.com/worldclock/converter.html?iso=20211201T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>
>>> ).
>>>
>>> The agenda is:
>>>
>>> - LWG3639: Handling of fill character width is underspecified in
>>> std::format <https://wg21.link/lwg3639>
>>> - P2286R3: Formatting Ranges <https://wg21.link/p2286r3>
>>>
>>> For LWG3639 <https://wg21.link/lwg3639>, the issue discussion
>>> enumerates three possible solutions, though others are possible. There is
>>> also a related wording omission; table [tab.format.align]
>>> <http://eel.is/c++draft/tab:format.align> doesn't actually specify how
>>> alignment is achieved for the '<' and '>' options (the wording doesn't
>>> state to insert fill characters as it does for the '^' option).
>>> Additionally, further tweaking (and possibly new LWG issues) may be needed
>>> to address these concerns:
>>>
>>> 1. If the width of the value exceeds the field width, then alignment
>>> to the end of the available space is not possible.
>>> [format.string.std]p8 <http://eel.is/c++draft/format.string.std#8>
>>> should be updated to address this possibility, presumably by noting that
>>> the content may overflow the available space resulting in misalignment.
>>> std::format("{:X>1}}, 9999);
>>> 2. If the estimated width of the fill character is greater than 1,
>>> then alignment to the end of the available space might not be possible. The
>>> choice here is whether to under-fill or over-fill the available space. This
>>> possibility is avoided if fill characters are restricted to characters with
>>> an estimated width of exactly 1.
>>> std::format("{:🤡>4}", 123);
>>>
>>> For P2286R3 <https://wg21.link/p2286r3>, LEWG requested
>>> <https://lists.isocpp.org/sg16/2021/11/2845.php> that SG16 consider the
>>> ramifications for support of user defined delimiters. We should also
>>> discuss the "?" specifier proposed to explicitly opt in to quoted and
>>> escaped formats for std::string, std::string_view, and arrays of char/
>>> wchar_t.
>>>
>>> Tom.
>>>
>>>
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2021-12-01 12:30:24