C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Supporting f-strings in C++: draft-R1

From: Bengt Gustafsson <bengt.gustafsson_at_[hidden]>
Date: Sun, 15 Oct 2023 23:58:15 +0200
I have some comments regarding the limitations that you put on f/x
literals. I see that these are made to reduce the implementation effort,
but I don't think this outweighs the drawbacks of having to remember the
restrictions. After all the implementation is done by a few programmers
while the remembering of restrictions and all kinds of special cases
increases the cognitive load for millions of programmers. In this case
I'm referring to:

1. The problem with the colon in the ternary ? operator. I don't think
it is much more complicated to handle this than to count matching
parentheses. The extra implementation to find the ? and then pass over
the next : on the outer level should be rather easy to do.

2. Not allowing expressions in {} after the colon (what is called nested
replacement fields). I don't actually see much implementation effort
reduction in disallowing this (it is allowed in P1819). The code to
interpret an expression inside a {} is already there and its result, to
put the expression afterwards is exactly the same.

3. Not allowing combination with R for raw literals. It seems like there
is no real reason not to allow this. The purpose of R literals is to be
able to have quotes and newlines in strings literals. You can use a R
literal as the format string for std::format so it seems odd to not
allow combining f/x and R, in which case the F or X would go between any
U prefix and the R which must be last.

4. Allowing macro expansion. This is of course a more natural limitation
for implementers considering the translation phase you are aiming at.
But it is not as obvious for programmers who will certainly picture this
as a purely textual transformation, which it basically is. Yes it is
probably more work to do this in the earlier parts of the preprocessor,
but not overly so. Note that R literals did force preprocessor
implementations to be updated, it's not that we never touch the
preprocessor. Also, to evaluate the expressions in #if a stand alone
preprocessor must already contain most of the machinery needed to figure
out where an embedded expression ends, i.e. tokenization and parenthesis
handling.

5. Escaping of quotation marks in the expressions. I'm a bit unsure of
what is the best choice, but if handling of f/x literals is moved to an
earlier stage of the preprocessor it would be _possible_ to handle
quotation marks without quoting them. I think it would be an advantage
to be able to write the expressions exactly the same as if they were
outside the f/x literal. There is also a problem with parentheses inside
such quotes (escaped or not) that needs to be discussed if "matching
parentheses" is employed. For instance:

F"{isLeft ? "(" : ")"}"

or with escaping of

F"{isLeft ? \"(\" : \")\"}"

There is also the worst case scenario when the embedded string literal
contains an escaped quote, for instance:

F"{isSingle ? "'" : "\""}"

or with escaping, I don't think this can work:

F"{isLeft ? \"'\" : \"\"\"}"

The problem here is that the escaping incurred by being inside a f/x
literal crashes with the need to escape the quote in the expression
being inserted. Parsing becomes (close to?) impossible.

This suggests to me that not requiring quoting is the right choice, but
this obviously requires the feature to be implemented entirely in the
preprocessor.

The alternatives is to further restrict the embedded expressions not
allowing quoted strings in them, or at least not escaped quotation marks
in those strings.

6. Not allowing encodings seems strange, especially as std::format is
defined for char and wchar_t. The source code is still in the source
character set and only after the expressions have been lifted out is the
remaining literal converted to the specified character set depending on
the prefix. There should be no issue in allowing all of the prefixes.

7. Not allowing locale also seems strange for X-literals as the literal
has nothing to do with the locale specific formatting of inserted
values. Even there is interaction between the formatting string after
the colon string interpolation does not touch that part.

In short only the first of your restrictions in chapter 4 is actually
logical to me. Numbered inserts is actually not very useful as long as
the first parameter to std::format is a literal, as its main use is to
be able to shuffle the order of the inserted values around when
translating the string to another (human) language. As translation is
out of scope for interpolated literals (in a compiled language) this
restriction is inconsequential given restriction #8, which is, as you
write, obvious.

Bengt

Received on 2023-10-15 21:58:21