C++ Logo


Advanced search

Re: Escaping text for use in a regex

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Sat, 7 Dec 2019 09:31:02 -0500
On Fri, Dec 6, 2019 at 9:32 AM Stephan Reiter via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> Hi!
> I encountered a situation where I needed to incorporate user input
> into a regular expression.
> To safely do this, I needed to escape regex-special characters in the
> user input.

At first I thought you were trying to sanitize a user-provided regex, which
is basically impossible:
(Relatedly, the std::regex library provides no safety controls; you can't
ask it to "try matching but bail out if no match was found after 2
seconds," or anything like that.)

However, if you just want to take a literal string and substitute it into
the middle of a larger regex, then yeah, there's no security issue with
that AFAIK. The std::regex library still makes that difficult by failing
to support operations such as "concatenate two regexes" or even "print this
regex back out as a string so that I can concatenate strings."

Also, std::regex is slooow — "2300% to 83000% slower than a quality
implementation for some inputs".[1]
even if you could use std::regex for this task, you wouldn't want to (at
least not in production software).

Boost.Xpressive supports substituting user-provided strings into "dynamic
regexes" similarly to how you'd substitute parameters into a parameterized
SQL query; see

std::regex is to the standard library as `switch` is to the core language:
lots of people have cool ideas for how to improve it, including some
slam-dunks, but really it's so abysmally far from well-designed that WG21
just doesn't want to touch it. See also: "polishing a turd."

std::string regex_escape(std::string_view text,
> std::regex_constants::syntax_option_type f =
> std::regex_constants::ECMAScript);

I think what you'd really want is
    std::regex regex_escape(std::string_view text, ......);
    std::regex regex_concatenate(std::regex a, std::regex b); // returns
the regex corresponding to "ab"
    std::regex regex_alternate(std::regex a, std::regex b); // returns the
regex corresponding to "a|b"
etc. etc.
But probably better to just use a library that already has those operations
built in, than try to rescue std::regex.


Received on 2019-12-07 08:33:38