sg16: Re: [SG16] Comment re: Library Motion 16, P2216R3, std::format improvements

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Mon, 7 Jun 2021 08:21:17 -0700

Just to clarify: I wouldn't say that P2216 introduces a new concept. If you
look at P0645 as well as its discussions, locale-independent formatting has
always been a part of the design and it includes both format string parsing
and formatting. Moreover, the ability to do compile-time checks via
constexpr formatter<T>::parse functions were added per LEWG request back in
R1 of P0645 although admittedly it could be better specified. I agree that
if there are any concerns they can be addressed via LWG issues.

> I am not aware of implementation experience for this paper in
environments where characters significant to the interpretation of the
format string are not locale-invariant.

There is implementation experience with statically detecting UTF-8 in
std::format implementation in Microsoft/STL:
https://github.com/microsoft/STL/issues/1820. P2216 is also fully
implemented in the fmt library and compile-time checks have been available
in a different form (because of lack of consteval) for several years.

Cheers,
Victor

On Sun, Jun 6, 2021 at 10:13 PM Tom Honermann <tom_at_[hidden]> wrote:

> On 6/6/21 8:15 PM, Hubert Tong wrote:
>
> Just hoping to clarify my own understanding of this, and perhaps making an
> observation in case this is helpful to others... I know that some of this
> has been discussed in specific circles.
>
> It is my understanding that this paper introduces a concept of
> compile-time conversion of strings associated with the `char` type (and
> also the same for `wchar_t`) into a "format string". I believe this
> necessitates attaching semantics to the "character" values (at compile
> time) to recognize characters such as the left brace (`{`). It is observed
> that the coded value of the left brace character is locale-variant in
> certain environments. Thus, the paper is establishing that format strings
> are not parsed based on locale. I am aware that the intent is to specify
> that the interpretation is based on the encoding used for literals;
> however, at this time, the wording does not indicate that intent.
>
> Thank you for raising this issue, Hubert. I agree that there is a wording
> oversight here. This concern seems like it can be addressed via a LWG
> issue.
>
>
> I am not aware of implementation experience for this paper in environments
> where characters significant to the interpretation of the format string are
> not locale-invariant. There is, however, reason to believe that an
> implementation can be realistically deployed to such environments while
> giving some ability of the user to choose the text encoding under which
> format strings are parsed. As it is, the paper uses *format-string* and
> *wformat-string* as exposition-only types in the signature of the
> `format` functions. It is possible for an implementation to version these
> functions (across translation unit boundaries) through embedding the text
> encoding information into these types. A mechanism such as
> std::text_encoding::literal().mib() from P1885R5 (which has not yet
> advanced to plenary) could be used.
>
> That mechanism does not appear to be an option for the vformat_to() or
> vformat() overloads since their signatures do not include *format-string*
> or *wformat-string*. It doesn't look to me like that information can be
> smuggled through the types of basic_format_args or basic_format_context
> either, though perhaps they could be used to store a value that indicates
> the literal encoding.
>
> [P1885 is currently scheduled for consideration by LEWG during its
> 2021-08-03 telecon]
>
>
> The above approach is perhaps more limiting than strictly necessary upon
> extensions that allow the translation of string literals to be changed
> within a translation unit. It is noted that P1885R5 exposes the literal
> encoding as a consteval function, which is compatible with
> context-sensitive evaluation by the implementation. In case there is an
> appetite to allow for such context-sensitivity for format strings, it is
> probably the case that updating the text to allow for exposition-only extra
> parameters in the signature is purely a specification matter and does not
> affect implementations where such scenarios do not occur. It is also rather
> likely that implementations which do employ such extra parameters are
> conforming anyway (because the extra parameters are only observable when a
> user applies an extension). Nevertheless, the paper may be just the
> beginning of a number of changes that are candidates for being considered
> retroactive to C++20.
>
> It looks to me like construction of a basic_format_context specialization
> is effectively unspecified due to lack of constructors and the presence of
> exposition only data members. Perhaps more of it can be specified as
> exposition only.
>
> Incidentally, I think the specification of basic_format_context may be
> missing an exposition only std::locale data member corresponding to any
> passed to a formatting function.
>
>
> TL;DR: The paper sets a direction (but does not actually spell out that it
> does) of using the encoding associated with literal translation for parsing
> format strings. The work around improving the management and handling of
> said encodings is still ongoing; therefore, where this paper leads us is
> not as clear as it could be given additional time. Nevertheless, it is
> probably the case that further incremental improvements can be made on top
> of this paper without compatibility breakage for implementations that
> choose to deploy earlier. In certain environments,
> quality-of-implementation around this paper may be dependent on additional
> improvements to the specification. Since this paper is being considered to
> be retroactive to C++20, it is reasonable to expect that improvements of
> the aforementioned kind would also be considered for retroactive inclusion
> as they are discovered.
>
> I agree, though the words "probably the case" give me pause. Regardless,
> at least for me, this does not translate to a desire to delay adopting this
> paper.
>
> Tom.
>
>
>

Received on 2021-06-07 10:21:32