Just to clarify: I wouldn't say that P2216 introduces a new concept. If you look at P0645 as well as its discussions, locale-independent formatting has always been a part of the design and it includes both format string parsing and formatting. Moreover, the ability to do compile-time checks via constexpr formatter<T>::parse functions were added per LEWG request back in R1 of P0645 although admittedly it could be better specified. I agree that if there are any concerns they can be addressed via LWG issues.

> I am not aware of implementation experience for this paper in environments where characters significant to the interpretation of the format string are not locale-invariant.

There is implementation experience with statically detecting UTF-8 in std::format implementation in Microsoft/STL: https://github.com/microsoft/STL/issues/1820. P2216 is also fully implemented in the fmt library and compile-time checks have been available in a different form (because of lack of consteval) for several years.

Cheers,
Victor

On Sun, Jun 6, 2021 at 10:13 PM Tom Honermann <tom@honermann.net> wrote:
On 6/6/21 8:15 PM, Hubert Tong wrote:
Just hoping to clarify my own understanding of this, and perhaps making an observation in case this is helpful to others... I know that some of this has been discussed in specific circles.

It is my understanding that this paper introduces a concept of compile-time conversion of strings associated with the `char` type (and also the same for `wchar_t`) into a "format string". I believe this necessitates attaching semantics to the "character" values (at compile time) to recognize characters such as the left brace (`{`). It is observed that the coded value of the left brace character is locale-variant in certain environments. Thus, the paper is establishing that format strings are not parsed based on locale. I am aware that the intent is to specify that the interpretation is based on the encoding used for literals; however, at this time, the wording does not indicate that intent.
Thank you for raising this issue, Hubert.  I agree that there is a wording oversight here.  This concern seems like it can be addressed via a LWG issue.

I am not aware of implementation experience for this paper in environments where characters significant to the interpretation of the format string are not locale-invariant. There is, however, reason to believe that an implementation can be realistically deployed to such environments while giving some ability of the user to choose the text encoding under which format strings are parsed. As it is, the paper uses format-string and wformat-string as exposition-only types in the signature of the `format` functions. It is possible for an implementation to version these functions (across translation unit boundaries) through embedding the text encoding information into these types. A mechanism such as std::text_encoding::literal().mib() from P1885R5 (which has not yet advanced to plenary) could be used.

That mechanism does not appear to be an option for the vformat_to() or vformat() overloads since their signatures do not include format-string or wformat-string.  It doesn't look to me like that information can be smuggled through the types of basic_format_args or basic_format_context either, though perhaps they could be used to store a value that indicates the literal encoding.

[P1885 is currently scheduled for consideration by LEWG during its 2021-08-03 telecon]


The above approach is perhaps more limiting than strictly necessary upon extensions that allow the translation of string literals to be changed within a translation unit. It is noted that P1885R5 exposes the literal encoding as a consteval function, which is compatible with context-sensitive evaluation by the implementation. In case there is an appetite to allow for such context-sensitivity for format strings, it is probably the case that updating the text to allow for exposition-only extra parameters in the signature is purely a specification matter and does not affect implementations where such scenarios do not occur. It is also rather likely that implementations which do employ such extra parameters are conforming anyway (because the extra parameters are only observable when a user applies an extension). Nevertheless, the paper may be just the beginning of a number of changes that are candidates for being considered retroactive to C++20.

It looks to me like construction of a basic_format_context specialization is effectively unspecified due to lack of constructors and the presence of exposition only data members.  Perhaps more of it can be specified as exposition only.

Incidentally, I think the specification of basic_format_context may be missing an exposition only std::locale data member corresponding to any passed to a formatting function.


TL;DR: The paper sets a direction (but does not actually spell out that it does) of using the encoding associated with literal translation for parsing format strings. The work around improving the management and handling of said encodings is still ongoing; therefore, where this paper leads us is not as clear as it could be given additional time. Nevertheless, it is probably the case that further incremental improvements can be made on top of this paper without compatibility breakage for implementations that choose to deploy earlier. In certain environments, quality-of-implementation around this paper may be dependent on additional improvements to the specification. Since this paper is being considered to be retroactive to C++20, it is reasonable to expect that improvements of the aforementioned kind would also be considered for retroactive inclusion as they are discovered.

I agree, though the words "probably the case" give me pause.  Regardless, at least for me, this does not translate to a desire to delay adopting this paper.

Tom.