sg16: Re: [SG16] Comment re: Library Motion 16, P2216R3, std::format improvements

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 7 Jun 2021 01:13:24 -0400

On 6/6/21 8:15 PM, Hubert Tong wrote:
> Just hoping to clarify my own understanding of this, and perhaps
> making an observation in case this is helpful to others... I know that
> some of this has been discussed in specific circles.
>
> It is my understanding that this paper introduces a concept of
> compile-time conversion of strings associated with the `char` type
> (and also the same for `wchar_t`) into a "format string". I believe
> this necessitates attaching semantics to the "character" values (at
> compile time) to recognize characters such as the left brace (`{`). It
> is observed that the coded value of the left brace character is
> locale-variant in certain environments. Thus, the paper is
> establishing that format strings are not parsed based on locale. I am
> aware that the intent is to specify that the interpretation is based
> on the encoding used for literals; however, at this time, the wording
> does not indicate that intent.
Thank you for raising this issue, Hubert. I agree that there is a
wording oversight here. This concern seems like it can be addressed via
a LWG issue.
>
> I am not aware of implementation experience for this paper in
> environments where characters significant to the interpretation of the
> format string are not locale-invariant. There is, however, reason to
> believe that an implementation can be realistically deployed to such
> environments while giving some ability of the user to choose the text
> encoding under which format strings are parsed. As it is, the paper
> uses /format-string/ and /wformat-string/ as exposition-only types in
> the signature of the `format` functions. It is possible for an
> implementation to version these functions (across translation unit
> boundaries) through embedding the text encoding information into these
> types. A mechanism such as std::text_encoding::literal().mib() from
> P1885R5 (which has not yet advanced to plenary) could be used.

That mechanism does not appear to be an option for the vformat_to() or
vformat() overloads since their signatures do not include
/format-string/ or /wformat-string/. It doesn't look to me like that
information can be smuggled through the types of basic_format_args or
basic_format_context either, though perhaps they could be used to store
a value that indicates the literal encoding.

[P1885 is currently scheduled for consideration by LEWG during its
2021-08-03 telecon]

>
> The above approach is perhaps more limiting than strictly necessary
> upon extensions that allow the translation of string literals to be
> changed within a translation unit. It is noted that P1885R5 exposes
> the literal encoding as a consteval function, which is compatible with
> context-sensitive evaluation by the implementation. In case there is
> an appetite to allow for such context-sensitivity for format strings,
> it is probably the case that updating the text to allow for
> exposition-only extra parameters in the signature is purely a
> specification matter and does not affect implementations where such
> scenarios do not occur. It is also rather likely that implementations
> which do employ such extra parameters are conforming anyway (because
> the extra parameters are only observable when a user applies an
> extension). Nevertheless, the paper may be just the beginning of a
> number of changes that are candidates for being considered retroactive
> to C++20.

It looks to me like construction of a basic_format_context
specialization is effectively unspecified due to lack of constructors
and the presence of exposition only data members. Perhaps more of it can
be specified as exposition only.

Incidentally, I think the specification of basic_format_context may be
missing an exposition only std::locale data member corresponding to any
passed to a formatting function.

>
> TL;DR: The paper sets a direction (but does not actually spell out
> that it does) of using the encoding associated with literal
> translation for parsing format strings. The work around improving the
> management and handling of said encodings is still ongoing; therefore,
> where this paper leads us is not as clear as it could be given
> additional time. Nevertheless, it is probably the case that further
> incremental improvements can be made on top of this paper without
> compatibility breakage for implementations that choose to deploy
> earlier. In certain environments, quality-of-implementation around
> this paper may be dependent on additional improvements to the
> specification. Since this paper is being considered to be retroactive
> to C++20, it is reasonable to expect that improvements of the
> aforementioned kind would also be considered for retroactive inclusion
> as they are discovered.

I agree, though the words "probably the case" give me pause. Regardless,
at least for me, this does not translate to a desire to delay adopting
this paper.

Tom.

Received on 2021-06-07 00:13:29