Some details about the escaping mechanism that has been implemented for P2286R3: Formatting Ranges in the fmt library. It basically follows the Rust's Unicode escaping (https://doc.rust-lang.org/src/core/char/methods.rs.html#399-420) with the following differences:

* Grapheme extended code points are not escaped although I don't see any problems with implementing this as well.

* The escape format uses C++ Unicode escape sequences rather than Rust's.

* Invalid code unit sequences are escaped for round trip (Rust doesn't need to do this because it has validation).

So the escaping logic for code points that need escaping is as follows:

\t, \r, \n, \, ' and " are escaped using their usual escape sequences such as literal text \t (not tab)
Code points in the range [0, 0x100) are escaped as \xhh, e.g. \x7f

Code points in the range [0x100, 0x10000] are escaped as \uhhhh, e.g. \u0378

Code points in the range [0x10000, 0x110000] are escaped as \Uhhhhhhhh, e.g. \U0002a6de

For invalid code unit sequences each individual code unit is escaped as \xhh (this doesn't happen in Rust or Python because those have validation but using hex escapes is a pretty natural extension)

This is obviously for the Unicode case. The non-Unicode case should be specified separately using our usual implementation-defined wording.

Cheers,

Victor

On Wed, Dec 1, 2021 at 7:26 AM Victor Zverovich <victor.zverovich@gmail.com> wrote:

> Is there value in specifying it?

Yes, I think there is value in handling this case consistently across implementations. Option 1 is problematic for width = 2 and not just with alignment to the end so it looks like 3 is the most sensible one (option 2 is clearly wrong and just there for completeness).

Cheers,
Victor

On Tue, Nov 30, 2021 at 10:13 AM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

This is your friendly reminder that this telecon will take place tomorrow.

If we manage to get through both LWG3639 and P2286R3, we'll resume discussion of P2361R4 (Unevaluated strings) and potentially poll forwarding it.

Tom.

On 11/27/21 7:30 PM, Tom Honermann via SG16 wrote:

SG16 will hold a telecon on Wednesday, December 1st at 19:30 UTC (timezone conversion).

The agenda is:

LWG3639: Handling of fill character width is underspecified in std::format

P2286R3: Formatting Ranges

For LWG3639, the issue discussion enumerates three possible solutions, though others are possible. There is also a related wording omission; table [tab.format.align] doesn't actually specify how alignment is achieved for the '<' and '>' options (the wording doesn't state to insert fill characters as it does for the '^' option). Additionally, further tweaking (and possibly new LWG issues) may be needed to address these concerns:

If the width of the value exceeds the field width, then alignment to the end of the available space is not possible. [format.string.std]p8 should be updated to address this possibility, presumably by noting that the content may overflow the available space resulting in misalignment.
std::format("{:X>1}}, 9999);

If the estimated width of the fill character is greater than 1, then alignment to the end of the available space might not be possible. The choice here is whether to under-fill or over-fill the available space. This possibility is avoided if fill characters are restricted to characters with an estimated width of exactly 1.
std::format("{:🤡>4}", 123);

For P2286R3, LEWG requested that SG16 consider the ramifications for support of user defined delimiters. We should also discuss the "?" specifier proposed to explicitly opt in to quoted and escaped formats for std::string, std::string_view, and arrays of char/wchar_t.

Tom.

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16