C++ Logo

std-proposals

Advanced search

Re: +AFs-std-proposals+AF0 Supporting f-strings in C+-+-: draft-R1

From: Hadriel Kaplan <hkaplan_at_[hidden]>
Date: Thu, 19 Oct 2023 18:28:35 +0000
Juniper Public
+AD4- From: Std-Proposals +ADw-std-proposals-bounces+AEA-lists.isocpp.org+AD4- on behalf of Sebastian Wittmeier via Std-Proposals +ADw-std-proposals+AEA-lists.isocpp.org+AD4-
+AD4- Date: Thursday, October 19, 2023 at 12:54 PM

+AD4- the 2. expression not only combined a plain string-literal with a F string, but also was separated in the middle of the expression.
+AD4- Concatenating strings so far happens in phase 5, macro expansion in phase 4.

Yes, but I was assuming (at the time) that we wanted this to be supported:

    F+ACI-a+AD0Aew-a+AH0AIg- +ACI-b+AD0Aew-b+AH0AIg-

Or if not that, then at least this one:

    F+ACI-a+AD0Aew-a+AH0AIg- F+ACI-b+AD0Aew-b+AH0AIg-

If we do, then we would have to perform +AF8-some+AF8- concatenation in the preprocessor, no matter what.

And the most straight-forward way to do it, I think, is to not parse or extract anything from the quoted-string portions until after concatenation is finished for any given sequence.

In other words, the preprocessor is expanding a preprocessing-token list (output from the lexer). So when it gets to a f-string-literal preprocessing-token, it can peek ahead and see if the next token is another literal, and if so then concatenate the two into one longer f-string-literal. Then loop and check the next token, etc.

When the next token is not a literal, it stops looping and can then parse+-convert the one long f-string-literal into the appropriate pieces.

But more details are below...


+AD4- If we want to have preprocessor support inside the expressions in the curly braces, the expressions in the curly braces in the F strings have to be separated out of the string in phase 3 or between current phase 3 and phase 4.
+AD4- So they would not be a string, but for example an internal macro +AF8AXw-F and +AF8AXw-X.
+AD4- Phase 3.5 would convert
+AD4- F+ACI-abs(a+-b)+AD0Aew-abs(a+-b)+AH0AIg-
+AD4- into
+AD4- -+AD4- +AF8AXw-F(+ACI-abs(a+-b)+AD0AewB9ACI-, abs(a+-b))
+AD4- In Phase 4 abs is replaced by its definition, e.g.
+AD4- +ACM-define abs(x) (((x) +ADw- 0) ? -(x) : (x))
+AD4- -+AD4- +AF8AXw-F(+ACI-abs(a+-b)+AD0AewB9ACI-, (((a+-b) +ADw- 0) ? -(a+-b) : (a+-b))
+AD4- and by std::format
+AD4- -+AD4- std::format(+ACI-abs(a+-b)+AD0AewB9ACI-, (((a+-b) +ADw- 0) ? -(a+-b) : (a+-b))

I wasn't really planning to do it that way precisely. I was planning to follow the usual model of the preprocessor (one-pass preprocessing-token expansion), but with a builtin operator for final parsing+-conversion, very similar to the current +AGAAXw-Pragma()+AGA- one.

For example the +AF8-Pragma() operator takes in a literal string, modifies it, and runs that string through the lexer and preprocessing again.

So my thought is: we create a new builtin +AF8-Xstring() operator. It would accept a single string-literal. It parses that one literal and converts it into the separate pieces: namely the format-string and zero or more substrings separated by commas. Those substrings are what we think of as the +ACI-expressions+ACI-, that were inside the replacement-fields, and were removed and are now after it.

After it munges the string, it then feeds it to the lexer and preprocessing - like +AF8-Pragma() does. That way they're not only parsed correctly into new preprocessing-tokens, but also any macros in those args will be expanded.

For the concatenation, if we need concatenation support, I was thinking that would happen +AF8-before+AF8- the above +AF8-Xstring() builtin would be used. The result of the concatenation would then be a single preprocessing-token string-literal, that's fed to +AF8-Xstring().

For the F+ACIAIg- f-string in particular, since it needs the +ACI-std::format()+ACI- portion too, you can just think of it like a macro +AGAAXw-F()+AGA-, defined as:

    +ACM-define +AF8-F(s) ::std::format( +AF8-Xstring(s) )

Of course it's not +AF8-exactly+AF8- that, but logically it acts like a defined macro.

It might be more correct to say f and x are more like this, logically speaking:

    +ACM-define +AF8-X(...) +AF8-Xstring( +AF8-Concat(+AF8AXw-VA+AF8-ARGS+AF8AXw-) )
    +ACM-define +AF8-F(...) ::std::format( +AF8-X(+AF8AXw-VA+AF8-ARGS+AF8AXw-) )

And the preprocessor builds the varargs when it sees an F+ACIAIg- or X+ACIAIg- and peeks ahead, as described earlier+ADs- and wraps the one-or-more literals as the args for +AF8-F() or +AF8-X() as appropriate.

So there'd be a new builtin +AGAAXw-Concat(...)+AGA- as well, which is variadic and concatenates string-literal tokens of the same type(s), replacing them with a single string-literal (of the common type). It does no parsing whatsoever - just string-literal token concatenation for whatever types we want it to handle.

And the nice thing about that - or the bad thing depending on your point of view - is that programmers could use +AGAAXw-Concat()+AGA- for other purposes too.

I'm not sure if all the above makes sense in an email - I need to update the draft proposal. :)

-hadriel

Received on 2023-10-19 18:28:40