C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Supporting f-strings in C++

From: Chris Gary <cgary512_at_[hidden]>
Date: Sat, 14 Oct 2023 17:33:37 -0600
Because "reply all" is not the default, and I needed to change a couple of
things.


> There should be no need for a MagicUtf8Prefix token. A literal operator
> declaration like the one Hadriel wrote above would be added to the
> user-defined literal overload set and resolved in the normal way. Support
> for UTF-8 f-string literals would then look like:
>
> template <class... Args>
> std::u8string operator ""_w(const char8_t*, size_t, Args&&...) { ... }
>
>
 That makes sense.

> It is somewhat unfortunate that the UDL would need to reparse the format
> string to identify the replacement fields and the format specifiers
> supplied with each. Perhaps another option is to package those details with
> each argument.
>
> The proposal introduces a parsing ambiguity that will need to be addressed
> in some way:
>
> F"{b?1:0}" // expression `b?1:0` with no format specifier?
> // or ill-formed expression `b?1` with a `:0` format specifier?
>
>
The other challenge with the variadic signature is delivering the
formatting details with the arguments (more strings with your string?).

So, instead of delimiters, just plain string literals and variables or
formatting details adjacent to one another:

auto str = u8F"How would you like your floats? You can have " f(1.23456) "
or " e(5.4321f) " in several different ways."_w;

Where "f" and "e" are just convenient formatting wrapper functions.

Which would expand to (roughly speaking):

FloatF arg0{ 1.2345 };
FloatE arg1{ 5.4321f };
auto str = operator ""_w( "How would you like your floats? You can have {0}
or {1} in several different ways.", 83u, arg0, arg1 );

The preprocessor just sees a stream of tokens, and this should be able to
exploit the same mechanism that allows "u8" to work with literal fragments
in the same way.

Bascially, just concat valid string literal tokens and *anythings* until
finding a semicolon. If it ends in an *anything*, assume it is a UDL suffix.

This is currently not legal: "one string"_w "then another"_w; So the syntax
change would just allow non-suffixes to appear anywhere in-between
fragments. Nothing is resolved until the literal is assembled into a single
object, which should be terminated by a ";", so assuming otherwise valid
syntax I'd imagine this would work.

The downside here is the proliferation of quotes.

On Sat, Oct 14, 2023 at 9:28 AM Tom Honermann <tom_at_[hidden]> wrote:

> On 10/13/23 4:14 PM, Chris Gary via Std-Proposals wrote:
>
> > Does the user need to provide a templated literal operator, such as:
>> >
>> > template <class... Args>
>> > std::string operator ""_w(const char*, size_t, Args&&...) { ... }
>
>
> This could probably generalize to "user-defined prefixes". In p1819
> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1819r0.html>,
> no mention was made of encoding, so I'm guessing there would be a need for
> permutations of different "f"-prefixes or similar to specify the code unit
> type.
> I imagine just assuming UTF-8 is safe in general, but its also a peculiar
> divergence from the behavior of other prefixes (not to mention making
> iteration non-trivial).
>
> Assuming UTF-8 is not safe. EBCDIC platforms remain relevant to the
> industry. IBM distributes a clang-based compiler for EBCDIC platforms and
> EBCDIC support is being contributed to LLVM/Clang. MSVC still requires an
> opt-in to UTF-8 and there are no plans to change its default encoding.
>
>
> Possibly something along the lines of:
>
> template< typename ...Args >
> YetAnotherString operator MagicUtf8Prefix "" ( const char8_t *, size_t,
> Args &&... )
>
> The paper already discusses this and, while it stops short of proposing
> it, I think it correctly acknowledges that the "f" prefix would combine
> with an encoding-prefix in the same way that an "R" prefix does (and in
> fact, it would be useful if both the "f" and "R" prefixes can be used
> together). There should be no need for a MagicUtf8Prefix token. A literal
> operator declaration like the one Hadriel wrote above would be added to the
> user-defined literal overload set and resolved in the normal way. Support
> for UTF-8 f-string literals would then look like:
>
> template <class... Args>
> std::u8string operator ""_w(const char8_t*, size_t, Args&&...) { ... }
>
> It is somewhat unfortunate that the UDL would need to reparse the format
> string to identify the replacement fields and the format specifiers
> supplied with each. Perhaps another option is to package those details with
> each argument.
>
> The proposal introduces a parsing ambiguity that will need to be addressed
> in some way:
>
> F"{b?1:0}" // expression `b?1:0` with no format specifier?
> // or ill-formed expression `b?1` with a `:0` format specifier?
>
> Tom.
>
>
> Then:
>
> auto arg = YetAnotherString{ "UTF-8 " };
> auto str = MagicUtf8Prefix"This is an interpolated {arg} string as a
> literal";
>
> Having some way to define a specific set of delimeters would also help to
> keep things as general as possible.
>
> On Fri, Oct 13, 2023 at 6:48 AM Marcin Jaczewski via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
>> pt., 13 paź 2023 o 14:24 Hadriel Kaplan <hkaplan_at_[hidden]> napisał(a):
>> >
>> > From: Marcin Jaczewski <marcinjaczewski86_at_[hidden]>
>> >
>> > > We have already tool to handle custom string using User-defined
>> literals:
>> > > std::string operator ""_w(const char16_t*, size_t);
>> > > Then `F"{aa}"_w` will simply transformed to:
>> > > operator ""_w("{0}", strlen("{0}"), aa); //always use (ptr, size)
>> first to prevent confusion
>> >
>> > So if I understand you correctly, you're suggesting to have the user
>> write this:
>> >
>> > std::cout << F"{aa} foo {bb}"_w;
>> >
>> > And have the compiler transform it to this:
>> >
>> > std::cout << ""_w("{} foo {}", 9, aa, bb);
>> >
>> > ...and then what?
>> >
>> > Does the user need to provide a templated literal operator, such as:
>> >
>> > template <class... Args>
>> > std::string operator ""_w(const char*, size_t, Args&&...) { ... }
>> >
>> > The above isn't currently legal, and I'm not sure it can be made legal
>> given we already have the string literal operator non-type template from
>> C++20, but let's assume it can - would everyone need to implement such a
>> literal operator?
>> >
>>
>> Yes, this is the whole point of this. Indeed this will need some
>> extension in allowed operators arguments.
>> But this is in some way a logical extension of current behavior.
>>
>> This would allow lot more than simply formatting arguments:
>> ```
>> F"select * from X where y = {x}"_sql; //no sql injection possible
>> F"add {x} {y}"_asm;
>> ```
>> This will add lot of interesting usage
>>
>> > Or would the standard library come with some default one, that people
>> can use to just do std::format()?
>> >
>> > Like defining a reserved one such as `""f`, so that this code:
>> >
>> > std::cout << F"{aa} foo {bb}"f;
>> >
>> > would logically do this:
>> >
>> > std::cout << std::format("{} foo {}", aa, bb);
>> >
>> > And if it offers that, then why not just have the the f-string
>> conversion add the `f` if there is no suffix already?
>>
>> Whole point is not to hardcode anything in language itself and make it
>> impossible to use in a freestanding environment.
>> And make clear who is responsible for managing this literal.
>> Another thing is future evolution of std lib, at some point it could
>> be possible `std2`, if current `std` is hardcoded
>> then a new version of the standard library can't be used for this. Or
>> optionly `std::format2` will be added.
>>
>>
>> >
>> > Which brings us back to what the paper already proposes now. Except, of
>> course, that the customization would be done a very different way than the
>> paper suggested. Which is fine by me, fwiw.
>> >
>> > -hadriel
>> >
>> >
>> >
>> > Juniper Business Use Only
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>
>
>

Received on 2023-10-14 23:33:49