C++ Logo

std-proposals

Advanced search

Re: [std-proposals] regex over istreams

From: Hans Åberg <haberg_1_at_[hidden]>
Date: Fri, 21 Feb 2025 10:08:26 +0100
> On 21 Feb 2025, at 00:51, Jonathan Wakely via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> On Thu, 20 Feb 2025, 21:39 Phil Bouchard, <boost_at_[hidden]> wrote:
>
> On 2/20/25 16:19, Jonathan Wakely wrote:
> >
> >
> > On Thu, 20 Feb 2025 at 20:41, Phil Bouchard via Std-Proposals
> > <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]>>
> > wrote:
> >
> > regex_match would get 1 more character on the need basis using in.get()
> > quite simply. If it fails then it would rewind the read pointer to
> > where
> > it was.
> >
> >
> > How? iostream putback is extremely limited.
>
> Using seekg().
>
> That might work on an ifstream or istringstream but not on an arbitrary istream.

It is as necessary to have a buffer of an arbitrarily large size for regexes, as the underlying theory for regular expressions just tells whether a string is in the language or not, and cannot tell when to stop. Examples are expressions like a|a*b, where on a string a… followed by something else than b, all but the first ‘a’ must be put back into the buffer. (Or some similar idea.)

So when starting with parsers, the one character put back rule is no longer useful:

One other example is when reading UTF-32 characters from a UTF-8 stream: Then one in general cannot put back the UTF-32 character, as it will in general occupy mora than one byte.

Received on 2025-02-21 09:08:44