C++ Logo

std-proposals

Advanced search

Re: Escaping text for use in a regex

From: Stephan Reiter <stephan.reiter_at_[hidden]>
Date: Mon, 9 Dec 2019 21:26:26 +0100
Thanks for the feedback, in particular also the "soft" feedback about
the situation of regex compared to other parts of the standard.

Considering the low complexity of the regex_escape code (once you kind
of figured out what characters to escape), I absolutely fine with
*not* polishing std::regex superficially with a function like this.
I do hope that std::regex will get attention though. Regular
expressions are so basic and it's annoying that I'll need a 3rd-party
library to get good performance and support for regex-construction.
:-(

inline std::string regex_escape(const std::string &text) {
  const std::regex chars_to_escape("[.^$|()\\[\\]{}*+?\\\\]");
  return std::regex_replace(text, chars_to_escape, "\\$0");
}

Stephan

Am Sa., 7. Dez. 2019 um 15:31 Uhr schrieb Arthur O'Dwyer via
Std-Proposals <std-proposals_at_[hidden]>:
>
> On Fri, Dec 6, 2019 at 9:32 AM Stephan Reiter via Std-Proposals <std-proposals_at_[hidden]> wrote:
>>
>> Hi!
>>
>> I encountered a situation where I needed to incorporate user input
>> into a regular expression.
>> To safely do this, I needed to escape regex-special characters in the
>> user input.
>
>
> At first I thought you were trying to sanitize a user-provided regex, which is basically impossible:
> https://stackoverflow.com/questions/12841970/how-can-i-recognize-an-evil-regex
> (Relatedly, the std::regex library provides no safety controls; you can't ask it to "try matching but bail out if no match was found after 2 seconds," or anything like that.)
>
> However, if you just want to take a literal string and substitute it into the middle of a larger regex, then yeah, there's no security issue with that AFAIK. The std::regex library still makes that difficult by failing to support operations such as "concatenate two regexes" or even "print this regex back out as a string so that I can concatenate strings."
>
> Also, std::regex is slooow — "2300% to 83000% slower than a quality implementation for some inputs".[1] So even if you could use std::regex for this task, you wouldn't want to (at least not in production software).
>
> Boost.Xpressive supports substituting user-provided strings into "dynamic regexes" similarly to how you'd substitute parameters into a parameterized SQL query; see https://www.boost.org/doc/libs/1_71_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks
>
> std::regex is to the standard library as `switch` is to the core language: lots of people have cool ideas for how to improve it, including some slam-dunks, but really it's so abysmally far from well-designed that WG21 just doesn't want to touch it. See also: "polishing a turd."
>
>
>> std::string regex_escape(std::string_view text,
>> std::regex_constants::syntax_option_type f =
>> std::regex_constants::ECMAScript);
>
>
> I think what you'd really want is
> std::regex regex_escape(std::string_view text, ......);
> std::regex regex_concatenate(std::regex a, std::regex b); // returns the regex corresponding to "ab"
> std::regex regex_alternate(std::regex a, std::regex b); // returns the regex corresponding to "a|b"
> etc. etc.
> But probably better to just use a library that already has those operations built in, than try to rescue std::regex.
>
> –Arthur
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2019-12-09 14:29:03