C++ Logo

std-proposals

Advanced search

Re: Escaping text for use in a regex

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Sat, 7 Dec 2019 09:31:02 -0500
On Fri, Dec 6, 2019 at 9:32 AM Stephan Reiter via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> Hi!
>
> I encountered a situation where I needed to incorporate user input
> into a regular expression.
> To safely do this, I needed to escape regex-special characters in the
> user input.
>

At first I thought you were trying to sanitize a user-provided regex, which
is basically impossible:
https://stackoverflow.com/questions/12841970/how-can-i-recognize-an-evil-regex
(Relatedly, the std::regex library provides no safety controls; you can't
ask it to "try matching but bail out if no match was found after 2
seconds," or anything like that.)

However, if you just want to take a literal string and substitute it into
the middle of a larger regex, then yeah, there's no security issue with
that AFAIK. The std::regex library still makes that difficult by failing
to support operations such as "concatenate two regexes" or even "print this
regex back out as a string so that I can concatenate strings."

Also, std::regex is slooow — "2300% to 83000% slower than a quality
implementation for some inputs".[1]
<https://quuxplusone.github.io/blog/2019/10/09/why-no-networking/#2-we-have-implementation-experi>
So
even if you could use std::regex for this task, you wouldn't want to (at
least not in production software).

Boost.Xpressive supports substituting user-provided strings into "dynamic
regexes" similarly to how you'd substitute parameters into a parameterized
SQL query; see
https://www.boost.org/doc/libs/1_71_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.tips_n_tricks

std::regex is to the standard library as `switch` is to the core language:
lots of people have cool ideas for how to improve it, including some
slam-dunks, but really it's so abysmally far from well-designed that WG21
just doesn't want to touch it. See also: "polishing a turd."


std::string regex_escape(std::string_view text,
> std::regex_constants::syntax_option_type f =
> std::regex_constants::ECMAScript);
>

I think what you'd really want is
    std::regex regex_escape(std::string_view text, ......);
    std::regex regex_concatenate(std::regex a, std::regex b); // returns
the regex corresponding to "ab"
    std::regex regex_alternate(std::regex a, std::regex b); // returns the
regex corresponding to "a|b"
etc. etc.
But probably better to just use a library that already has those operations
built in, than try to rescue std::regex.

–Arthur

Received on 2019-12-07 08:33:38