sg16: [SG16] Comment re: Library Motion 16, P2216R3, std::format improvements

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Sun, 6 Jun 2021 20:15:32 -0400

Just hoping to clarify my own understanding of this, and perhaps making an
observation in case this is helpful to others... I know that some of this
has been discussed in specific circles.

It is my understanding that this paper introduces a concept of compile-time
conversion of strings associated with the `char` type (and also the same
for `wchar_t`) into a "format string". I believe this necessitates
attaching semantics to the "character" values (at compile time) to
recognize characters such as the left brace (`{`). It is observed that the
coded value of the left brace character is locale-variant in certain
environments. Thus, the paper is establishing that format strings are not
parsed based on locale. I am aware that the intent is to specify that the
interpretation is based on the encoding used for literals; however, at this
time, the wording does not indicate that intent.

I am not aware of implementation experience for this paper in environments
where characters significant to the interpretation of the format string are
not locale-invariant. There is, however, reason to believe that an
implementation can be realistically deployed to such environments while
giving some ability of the user to choose the text encoding under which
format strings are parsed. As it is, the paper uses *format-string* and
*wformat-string* as exposition-only types in the signature of the `format`
functions. It is possible for an implementation to version these functions
(across translation unit boundaries) through embedding the text encoding
information into these types. A mechanism such as
std::text_encoding::literal().mib() from P1885R5 (which has not yet
advanced to plenary) could be used.

The above approach is perhaps more limiting than strictly necessary upon
extensions that allow the translation of string literals to be changed
within a translation unit. It is noted that P1885R5 exposes the literal
encoding as a consteval function, which is compatible with
context-sensitive evaluation by the implementation. In case there is an
appetite to allow for such context-sensitivity for format strings, it is
probably the case that updating the text to allow for exposition-only extra
parameters in the signature is purely a specification matter and does not
affect implementations where such scenarios do not occur. It is also rather
likely that implementations which do employ such extra parameters are
conforming anyway (because the extra parameters are only observable when a
user applies an extension). Nevertheless, the paper may be just the
beginning of a number of changes that are candidates for being considered
retroactive to C++20.

TL;DR: The paper sets a direction (but does not actually spell out that it
does) of using the encoding associated with literal translation for parsing
format strings. The work around improving the management and handling of
said encodings is still ongoing; therefore, where this paper leads us is
not as clear as it could be given additional time. Nevertheless, it is
probably the case that further incremental improvements can be made on top
of this paper without compatibility breakage for implementations that
choose to deploy earlier. In certain environments,
quality-of-implementation around this paper may be dependent on additional
improvements to the specification. Since this paper is being considered to
be retroactive to C++20, it is reasonable to expect that improvements of
the aforementioned kind would also be considered for retroactive inclusion
as they are discovered.

Received on 2021-06-06 19:16:02