C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Supporting f-strings in C++: draft-R1

From: Marcin Jaczewski <marcinjaczewski86_at_[hidden]>
Date: Fri, 20 Oct 2023 02:28:38 +0200
czw., 19 paź 2023 o 20:28 Hadriel Kaplan via Std-Proposals
<std-proposals_at_[hidden]> napisał(a):
>
>
> Juniper Public
> > From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of Sebastian Wittmeier via Std-Proposals <std-proposals_at_[hidden]>
> > Date: Thursday, October 19, 2023 at 12:54 PM
>
> > the 2. expression not only combined a plain string-literal with a F string, but also was separated in the middle of the expression.
> > Concatenating strings so far happens in phase 5, macro expansion in phase 4.
>
> Yes, but I was assuming (at the time) that we wanted this to be supported:
>
> F"a={a}" "b={b}"
>
> Or if not that, then at least this one:
>
> F"a={a}" F"b={b}"
>
> If we do, then we would have to perform _some_ concatenation in the preprocessor, no matter what.
>
> And the most straight-forward way to do it, I think, is to not parse or extract anything from the quoted-string portions until after concatenation is finished for any given sequence.
>
> In other words, the preprocessor is expanding a preprocessing-token list (output from the lexer). So when it gets to a f-string-literal preprocessing-token, it can peek ahead and see if the next token is another literal, and if so then concatenate the two into one longer f-string-literal. Then loop and check the next token, etc.
>
> When the next token is not a literal, it stops looping and can then parse+convert the one long f-string-literal into the appropriate pieces.
>
> But more details are below...
>
>
> > If we want to have preprocessor support inside the expressions in the curly braces, the expressions in the curly braces in the F strings have to be separated out of the string in phase 3 or between current phase 3 and phase 4.
> > So they would not be a string, but for example an internal macro __F and __X.
> > Phase 3.5 would convert
> > F"abs(a+b)={abs(a+b)}"
> > into
> > -> __F("abs(a+b)={}", abs(a+b))
> > In Phase 4 abs is replaced by its definition, e.g.
> > #define abs(x) (((x) < 0) ? -(x) : (x))
> > -> __F("abs(a+b)={}", (((a+b) < 0) ? -(a+b) : (a+b))
> > and by std::format
> > -> std::format("abs(a+b)={}", (((a+b) < 0) ? -(a+b) : (a+b))
>
> I wasn't really planning to do it that way precisely. I was planning to follow the usual model of the preprocessor (one-pass preprocessing-token expansion), but with a builtin operator for final parsing+conversion, very similar to the current `_Pragma()` one.
>
> For example the _Pragma() operator takes in a literal string, modifies it, and runs that string through the lexer and preprocessing again.
>
> So my thought is: we create a new builtin _Xstring() operator. It would accept a single string-literal. It parses that one literal and converts it into the separate pieces: namely the format-string and zero or more substrings separated by commas. Those substrings are what we think of as the "expressions", that were inside the replacement-fields, and were removed and are now after it.
>
> After it munges the string, it then feeds it to the lexer and preprocessing - like _Pragma() does. That way they're not only parsed correctly into new preprocessing-tokens, but also any macros in those args will be expanded.
>
> For the concatenation, if we need concatenation support, I was thinking that would happen _before_ the above _Xstring() builtin would be used. The result of the concatenation would then be a single preprocessing-token string-literal, that's fed to _Xstring().
>
> For the F"" f-string in particular, since it needs the "std::format()" portion too, you can just think of it like a macro `_F()`, defined as:
>
> #define _F(s) ::std::format( _Xstring(s) )
>
> Of course it's not _exactly_ that, but logically it acts like a defined macro.
>
> It might be more correct to say f and x are more like this, logically speaking:
>
> #define _X(...) _Xstring( _Concat(__VA_ARGS__) )
> #define _F(...) ::std::format( _X(__VA_ARGS__) )
>
> And the preprocessor builds the varargs when it sees an F"" or X"" and peeks ahead, as described earlier; and wraps the one-or-more literals as the args for _F() or _X() as appropriate.
>
> So there'd be a new builtin `_Concat(...)` as well, which is variadic and concatenates string-literal tokens of the same type(s), replacing them with a single string-literal (of the common type). It does no parsing whatsoever - just string-literal token concatenation for whatever types we want it to handle.
>
> And the nice thing about that - or the bad thing depending on your point of view - is that programmers could use `_Concat()` for other purposes too.
>
> I'm not sure if all the above makes sense in an email - I need to update the draft proposal. :)
>
> -hadriel
>

Do you mean that `_X` should accept code like:
```
_X("aaaaa", "bbbbbbb", "cccccc")
```
I do not think this is needed, normal string get concat any way in next stage
and if `_X` get "one" argument that have multiple string parts, it can
merge them
itself.


Now side tangent, as this could be a preprocessor function, what benefits
will it bring to basic C or even using C api in C++?
C `printf` will not switch from `%` to `{}`.
This brings me to the conclusion that `_X` should not be a "basic" operation
but a composition of lower level macro functions.
Let declare macro:
```
_PARSE_EXTRACT(STRING, FORMAT, ARG)
```
It will concat all string in first argument and then parse and
split it into multiple parts, then shuffle it around into two
macro functions pass in as the second and third argument. e.g.
```
#define STR(A) #A
#define FORMAT(A, ...) "{" __VA_OPT__(":" STR(__VA_ARGS__)) "}"
#define ARG(A, ...) , A
_PARSE_EXTRACT(u8"{a:b}..." "...{c" "}"_sql, FORMAT, ARG)

//expand to
#define ARG_A a
#define ARG_AF b
#define ARG_C c
u8""_sql FORMAT(ARG_A, ARG_AF) u8"......"_sql FORMAT(ARG_C)
u8""_sql ARG(ARG_A, ARG_AF) ARG(ARG_C)

//expand to
u8"" "{" ":" "b" "}" u8"......"_sql "{" "}" u8""_sql , a , c
```

It can look over the top but this would allow creating safe printf (at
least for simple cases):

```
#define TYPE_d int
#define TYPE_f float
#define FORMAT(A, B) "%" STR(B)
#define ARG(A, B) ,_Generic((A), TYPE_##B: (A))
#define BETTER_PRINTF(F) printf(_PARSE_EXTRACT(F, FORMAT, ARG))

//usage
int i;
float f;
BETTER_PRINTF("i={i:d} j={j:f}");

//expand to
printf("i=" "%" "d" " j=" "%" "f" "", _Generic((i), int:
(i)), _Generic((j), float: (j)) );
```

This could be used with iosteams:

```
#define FORMAT(A) << (A) <<
#define ARG(A)
#define BETTER_OUT(F) _PARSE_EXTRACT(F, FORMAT, ARG)

//usage
int i;
float j;
std::cout << BETTER_OUT("{i}{j}");

//expand to
std::cout << "" << (i) << "" << (j) << "";
```




>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2023-10-20 00:28:50