C++ Logo

std-proposals

Advanced search

Re: Escaping text for use in a regex

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Tue, 10 Dec 2019 08:54:25 -0500
Btw, I should have mentioned Google RE2 among the state of the art. RE2
basically treats regexes as strings (rather than forcing every regex to be
"compiled" into a non-string object like std::regex). RE2 does provide
`re2::QuoteMeta()` which takes a string and returns a string and does
exactly what you asked for.
https://github.com/google/re2/blob/master/re2/re2.h#L444-L450


On Mon, Dec 9, 2019 at 3:26 PM Stephan Reiter via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> Thanks for the feedback, in particular also the "soft" feedback about
> the situation of regex compared to other parts of the standard.
>
> Considering the low complexity of the regex_escape code (once you kind
> of figured out what characters to escape), I absolutely fine with
> *not* polishing std::regex superficially with a function like this.
> I do hope that std::regex will get attention though. Regular
> expressions are so basic and it's annoying that I'll need a 3rd-party
> library to get good performance and support for regex-construction.
> :-(
>
> inline std::string regex_escape(const std::string &text) {
> const std::regex chars_to_escape("[.^$|()\\[\\]{}*+?\\\\]");
> return std::regex_replace(text, chars_to_escape, "\\$0");
> }
>
> Stephan
>
> Am Sa., 7. Dez. 2019 um 15:31 Uhr schrieb Arthur O'Dwyer via
> Std-Proposals <std-proposals_at_[hidden]>:
> >
> > On Fri, Dec 6, 2019 at 9:32 AM Stephan Reiter via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >>
> >> Hi!
> >>
> >> I encountered a situation where I needed to incorporate user input
> >> into a regular expression.
> >> To safely do this, I needed to escape regex-special characters in the
> >> user input.
> >
> >
> > At first I thought you were trying to sanitize a user-provided regex,
> which is basically impossible:
> >
> https://stackoverflow.com/questions/12841970/how-can-i-recognize-an-evil-regex
> > (Relatedly, the std::regex library provides no safety controls; you
> can't ask it to "try matching but bail out if no match was found after 2
> seconds," or anything like that.)
> >
> > However, if you just want to take a literal string and substitute it
> into the middle of a larger regex, then yeah, there's no security issue
> with that AFAIK. The std::regex library still makes that difficult by
> failing to support operations such as "concatenate two regexes" or even
> "print this regex back out as a string so that I can concatenate strings."
> >
> > Also, std::regex is slooow — "2300% to 83000% slower than a quality
> implementation for some inputs".[1] So even if you could use std::regex for
> this task, you wouldn't want to (at least not in production software).
> >
> > Boost.Xpressive supports substituting user-provided strings into
> "dynamic regexes" similarly to how you'd substitute parameters into a
> parameterized SQL query; see
> https://www.boost.org/doc/libs/1_71_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks
> >
> > std::regex is to the standard library as `switch` is to the core
> language: lots of people have cool ideas for how to improve it, including
> some slam-dunks, but really it's so abysmally far from well-designed that
> WG21 just doesn't want to touch it. See also: "polishing a turd."
> >
> >
> >> std::string regex_escape(std::string_view text,
> >> std::regex_constants::syntax_option_type f =
> >> std::regex_constants::ECMAScript);
> >
> >
> > I think what you'd really want is
> > std::regex regex_escape(std::string_view text, ......);
> > std::regex regex_concatenate(std::regex a, std::regex b); //
> returns the regex corresponding to "ab"
> > std::regex regex_alternate(std::regex a, std::regex b); // returns
> the regex corresponding to "a|b"
> > etc. etc.
> > But probably better to just use a library that already has those
> operations built in, than try to rescue std::regex.
> >
> > –Arthur
> > --
> > Std-Proposals mailing list
> > Std-Proposals_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2019-12-10 07:57:02