C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Supporting f-strings in C++

From: Chris Gary <cgary512_at_[hidden]>
Date: Sat, 14 Oct 2023 17:35:40 -0600
>
> This is currently not legal: "one string"_w "then another"_w;
>

I correct myself. It is legal, and my idea is probably broken.

On Sat, Oct 14, 2023 at 5:33 PM Chris Gary <cgary512_at_[hidden]> wrote:

> Because "reply all" is not the default, and I needed to change a couple of
> things.
>
>
>> There should be no need for a MagicUtf8Prefix token. A literal operator
>> declaration like the one Hadriel wrote above would be added to the
>> user-defined literal overload set and resolved in the normal way. Support
>> for UTF-8 f-string literals would then look like:
>>
>> template <class... Args>
>> std::u8string operator ""_w(const char8_t*, size_t, Args&&...) { ... }
>>
>>
> That makes sense.
>
>> It is somewhat unfortunate that the UDL would need to reparse the format
>> string to identify the replacement fields and the format specifiers
>> supplied with each. Perhaps another option is to package those details with
>> each argument.
>>
>> The proposal introduces a parsing ambiguity that will need to be
>> addressed in some way:
>>
>> F"{b?1:0}" // expression `b?1:0` with no format specifier?
>> // or ill-formed expression `b?1` with a `:0` format specifier?
>>
>>
> The other challenge with the variadic signature is delivering the
> formatting details with the arguments (more strings with your string?).
>
> So, instead of delimiters, just plain string literals and variables or
> formatting details adjacent to one another:
>
> auto str = u8F"How would you like your floats? You can have " f(1.23456) "
> or " e(5.4321f) " in several different ways."_w;
>
> Where "f" and "e" are just convenient formatting wrapper functions.
>
> Which would expand to (roughly speaking):
>
> FloatF arg0{ 1.2345 };
> FloatE arg1{ 5.4321f };
> auto str = operator ""_w( "How would you like your floats? You can have
> {0} or {1} in several different ways.", 83u, arg0, arg1 );
>
> The preprocessor just sees a stream of tokens, and this should be able to
> exploit the same mechanism that allows "u8" to work with literal fragments
> in the same way.
>
> Bascially, just concat valid string literal tokens and *anythings* until
> finding a semicolon. If it ends in an *anything*, assume it is a UDL suffix.
>
> This is currently not legal: "one string"_w "then another"_w; So the
> syntax change would just allow non-suffixes to appear anywhere in-between
> fragments. Nothing is resolved until the literal is assembled into a single
> object, which should be terminated by a ";", so assuming otherwise valid
> syntax I'd imagine this would work.
>
> The downside here is the proliferation of quotes.
>
> On Sat, Oct 14, 2023 at 9:28 AM Tom Honermann <tom_at_[hidden]> wrote:
>
>> On 10/13/23 4:14 PM, Chris Gary via Std-Proposals wrote:
>>
>> > Does the user need to provide a templated literal operator, such as:
>>> >
>>> > template <class... Args>
>>> > std::string operator ""_w(const char*, size_t, Args&&...) { ... }
>>
>>
>> This could probably generalize to "user-defined prefixes". In p1819
>> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1819r0.html>,
>> no mention was made of encoding, so I'm guessing there would be a need for
>> permutations of different "f"-prefixes or similar to specify the code unit
>> type.
>> I imagine just assuming UTF-8 is safe in general, but its also a peculiar
>> divergence from the behavior of other prefixes (not to mention making
>> iteration non-trivial).
>>
>> Assuming UTF-8 is not safe. EBCDIC platforms remain relevant to the
>> industry. IBM distributes a clang-based compiler for EBCDIC platforms and
>> EBCDIC support is being contributed to LLVM/Clang. MSVC still requires an
>> opt-in to UTF-8 and there are no plans to change its default encoding.
>>
>>
>> Possibly something along the lines of:
>>
>> template< typename ...Args >
>> YetAnotherString operator MagicUtf8Prefix "" ( const char8_t *, size_t,
>> Args &&... )
>>
>> The paper already discusses this and, while it stops short of proposing
>> it, I think it correctly acknowledges that the "f" prefix would combine
>> with an encoding-prefix in the same way that an "R" prefix does (and in
>> fact, it would be useful if both the "f" and "R" prefixes can be used
>> together). There should be no need for a MagicUtf8Prefix token. A
>> literal operator declaration like the one Hadriel wrote above would be
>> added to the user-defined literal overload set and resolved in the normal
>> way. Support for UTF-8 f-string literals would then look like:
>>
>> template <class... Args>
>> std::u8string operator ""_w(const char8_t*, size_t, Args&&...) { ... }
>>
>> It is somewhat unfortunate that the UDL would need to reparse the format
>> string to identify the replacement fields and the format specifiers
>> supplied with each. Perhaps another option is to package those details with
>> each argument.
>>
>> The proposal introduces a parsing ambiguity that will need to be
>> addressed in some way:
>>
>> F"{b?1:0}" // expression `b?1:0` with no format specifier?
>> // or ill-formed expression `b?1` with a `:0` format specifier?
>>
>> Tom.
>>
>>
>> Then:
>>
>> auto arg = YetAnotherString{ "UTF-8 " };
>> auto str = MagicUtf8Prefix"This is an interpolated {arg} string as a
>> literal";
>>
>> Having some way to define a specific set of delimeters would also help to
>> keep things as general as possible.
>>
>> On Fri, Oct 13, 2023 at 6:48 AM Marcin Jaczewski via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>>
>>> pt., 13 paź 2023 o 14:24 Hadriel Kaplan <hkaplan_at_[hidden]>
>>> napisał(a):
>>> >
>>> > From: Marcin Jaczewski <marcinjaczewski86_at_[hidden]>
>>> >
>>> > > We have already tool to handle custom string using User-defined
>>> literals:
>>> > > std::string operator ""_w(const char16_t*, size_t);
>>> > > Then `F"{aa}"_w` will simply transformed to:
>>> > > operator ""_w("{0}", strlen("{0}"), aa); //always use (ptr, size)
>>> first to prevent confusion
>>> >
>>> > So if I understand you correctly, you're suggesting to have the user
>>> write this:
>>> >
>>> > std::cout << F"{aa} foo {bb}"_w;
>>> >
>>> > And have the compiler transform it to this:
>>> >
>>> > std::cout << ""_w("{} foo {}", 9, aa, bb);
>>> >
>>> > ...and then what?
>>> >
>>> > Does the user need to provide a templated literal operator, such as:
>>> >
>>> > template <class... Args>
>>> > std::string operator ""_w(const char*, size_t, Args&&...) { ... }
>>> >
>>> > The above isn't currently legal, and I'm not sure it can be made legal
>>> given we already have the string literal operator non-type template from
>>> C++20, but let's assume it can - would everyone need to implement such a
>>> literal operator?
>>> >
>>>
>>> Yes, this is the whole point of this. Indeed this will need some
>>> extension in allowed operators arguments.
>>> But this is in some way a logical extension of current behavior.
>>>
>>> This would allow lot more than simply formatting arguments:
>>> ```
>>> F"select * from X where y = {x}"_sql; //no sql injection possible
>>> F"add {x} {y}"_asm;
>>> ```
>>> This will add lot of interesting usage
>>>
>>> > Or would the standard library come with some default one, that people
>>> can use to just do std::format()?
>>> >
>>> > Like defining a reserved one such as `""f`, so that this code:
>>> >
>>> > std::cout << F"{aa} foo {bb}"f;
>>> >
>>> > would logically do this:
>>> >
>>> > std::cout << std::format("{} foo {}", aa, bb);
>>> >
>>> > And if it offers that, then why not just have the the f-string
>>> conversion add the `f` if there is no suffix already?
>>>
>>> Whole point is not to hardcode anything in language itself and make it
>>> impossible to use in a freestanding environment.
>>> And make clear who is responsible for managing this literal.
>>> Another thing is future evolution of std lib, at some point it could
>>> be possible `std2`, if current `std` is hardcoded
>>> then a new version of the standard library can't be used for this. Or
>>> optionly `std::format2` will be added.
>>>
>>>
>>> >
>>> > Which brings us back to what the paper already proposes now. Except,
>>> of course, that the customization would be done a very different way than
>>> the paper suggested. Which is fine by me, fwiw.
>>> >
>>> > -hadriel
>>> >
>>> >
>>> >
>>> > Juniper Business Use Only
>>> --
>>> Std-Proposals mailing list
>>> Std-Proposals_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>>>
>>
>>

Received on 2023-10-14 23:35:52