C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Supporting f-strings in C++

From: Tom Honermann <tom_at_[hidden]>
Date: Sat, 14 Oct 2023 11:28:38 -0400
On 10/13/23 4:14 PM, Chris Gary via Std-Proposals wrote:
>
> > Does the user need to provide a templated literal operator, such as:
> >
> > template <class... Args>
> > std::string operator ""_w(const char*, size_t, Args&&...) { ... }
>
>
> This could probably generalize to "user-defined prefixes". In p1819
> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1819r0.html>,
> no mention was made of encoding, so I'm guessing there would be a need
> for permutations of different "f"-prefixes or similar to specify the
> code unit type.
> I imagine just assuming UTF-8 is safe in general, but its also a
> peculiar divergence from the behavior of other prefixes (not to
> mention making iteration non-trivial).

Assuming UTF-8 is not safe. EBCDIC platforms remain relevant to the
industry. IBM distributes a clang-based compiler for EBCDIC platforms
and EBCDIC support is being contributed to LLVM/Clang. MSVC still
requires an opt-in to UTF-8 and there are no plans to change its default
encoding.

>
> Possibly something along the lines of:
>
> template< typename ...Args >
> YetAnotherString operator MagicUtf8Prefix "" ( const char8_t *,
> size_t, Args &&... )

The paper already discusses this and, while it stops short of proposing
it, I think it correctly acknowledges that the "f" prefix would combine
with an encoding-prefix in the same way that an "R" prefix does (and in
fact, it would be useful if both the "f" and "R" prefixes can be used
together). There should be no need for a MagicUtf8Prefix token. A
literal operator declaration like the one Hadriel wrote above would be
added to the user-defined literal overload set and resolved in the
normal way. Support for UTF-8 f-string literals would then look like:

    template <class... Args>
    std::u8string operator ""_w(const char8_t*, size_t, Args&&...) { ... }

It is somewhat unfortunate that the UDL would need to reparse the format
string to identify the replacement fields and the format specifiers
supplied with each. Perhaps another option is to package those details
with each argument.

The proposal introduces a parsing ambiguity that will need to be
addressed in some way:

    F"{b?1:0}" // expression `b?1:0` with no format specifier?
                // or ill-formed expression `b?1` with a `:0` format
    specifier?

Tom.

>
> Then:
>
> auto arg = YetAnotherString{ "UTF-8 " };
> auto str = MagicUtf8Prefix"This is an interpolated {arg} string as a
> literal";
>
> Having some way to define a specific set of delimeters would also help
> to keep things as general as possible.
>
> On Fri, Oct 13, 2023 at 6:48 AM Marcin Jaczewski via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
>
> pt., 13 paź 2023 o 14:24 Hadriel Kaplan <hkaplan_at_[hidden]>
> napisał(a):
> >
> > From: Marcin Jaczewski <marcinjaczewski86_at_[hidden]>
> >
> > > We have already tool to handle custom string using
> User-defined literals:
> > > std::string operator ""_w(const char16_t*, size_t);
> > > Then `F"{aa}"_w` will simply transformed to:
> > > operator ""_w("{0}", strlen("{0}"), aa); //always use (ptr,
> size) first to prevent confusion
> >
> > So if I understand you correctly, you're suggesting to have the
> user write this:
> >
> > std::cout << F"{aa} foo {bb}"_w;
> >
> > And have the compiler transform it to this:
> >
> > std::cout << ""_w("{} foo {}", 9, aa, bb);
> >
> > ...and then what?
> >
> > Does the user need to provide a templated literal operator, such as:
> >
> > template <class... Args>
> > std::string operator ""_w(const char*, size_t, Args&&...) {
> ... }
> >
> > The above isn't currently legal, and I'm not sure it can be made
> legal given we already have the string literal operator non-type
> template from C++20, but let's assume it can - would everyone need
> to implement such a literal operator?
> >
>
> Yes, this is the whole point of this. Indeed this will need some
> extension in allowed operators arguments.
> But this is in some way a logical extension of current behavior.
>
> This would allow lot more than simply formatting arguments:
> ```
> F"select * from X where y = {x}"_sql; //no sql injection possible
> F"add {x} {y}"_asm;
> ```
> This will add lot of interesting usage
>
> > Or would the standard library come with some default one, that
> people can use to just do std::format()?
> >
> > Like defining a reserved one such as `""f`, so that this code:
> >
> > std::cout << F"{aa} foo {bb}"f;
> >
> > would logically do this:
> >
> > std::cout << std::format("{} foo {}", aa, bb);
> >
> > And if it offers that, then why not just have the the f-string
> conversion add the `f` if there is no suffix already?
>
> Whole point is not to hardcode anything in language itself and make it
> impossible to use in a freestanding environment.
> And make clear who is responsible for managing this literal.
> Another thing is future evolution of std lib, at some point it could
> be possible `std2`, if current `std` is hardcoded
> then a new version of the standard library can't be used for this. Or
> optionly `std::format2` will be added.
>
>
> >
> > Which brings us back to what the paper already proposes now.
> Except, of course, that the customization would be done a very
> different way than the paper suggested. Which is fine by me, fwiw.
> >
> > -hadriel
> >
> >
> >
> > Juniper Business Use Only
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
>

Received on 2023-10-14 15:28:40