Date: Fri, 21 Feb 2025 11:09:05 +0100
> On 21 Feb 2025, at 10:53, Jonathan Wakely <cxx_at_[hidden]> wrote:
>
> On Fri, 21 Feb 2025 at 09:08, Hans Åberg <haberg_1_at_[hidden]> wrote:
>
> > On 21 Feb 2025, at 00:51, Jonathan Wakely via Std-Proposals <std-proposals_at_[hidden]> wrote:
> >
> > On Thu, 20 Feb 2025, 21:39 Phil Bouchard, <boost_at_[hidden]> wrote:
> >
> > On 2/20/25 16:19, Jonathan Wakely wrote:
> > >
> > >
> > > On Thu, 20 Feb 2025 at 20:41, Phil Bouchard via Std-Proposals
> > > <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]>>
> > > wrote:
> > >
> > > regex_match would get 1 more character on the need basis using in.get()
> > > quite simply. If it fails then it would rewind the read pointer to
> > > where
> > > it was.
> > >
> > >
> > > How? iostream putback is extremely limited.
> >
> > Using seekg().
> >
> > That might work on an ifstream or istringstream but not on an arbitrary istream.
>
> It is as necessary to have a buffer of an arbitrarily large size for regexes, as the underlying theory for regular expressions just tells whether a string is in the language or not, and cannot tell when to stop. Examples are expressions like a|a*b, where on a string a… followed by something else than b, all but the first ‘a’ must be put back into the buffer. (Or some similar idea.)
>
> So when starting with parsers, the one character put back rule is no longer useful:
>
> One other example is when reading UTF-32 characters from a UTF-8 stream: Then one in general cannot put back the UTF-32 character, as it will in general occupy mora than one byte.
>
>
> Which is why you don't want to build it on top of a single-pass range, like an istream.
One can have a single-pass buffered input stream, only that the buffer is larger than one character, which is what Flex does. Or like the other C++ formatted input already present, for different types, int's, floats, and strings, where you can't put back characters.
>
> On Fri, 21 Feb 2025 at 09:08, Hans Åberg <haberg_1_at_[hidden]> wrote:
>
> > On 21 Feb 2025, at 00:51, Jonathan Wakely via Std-Proposals <std-proposals_at_[hidden]> wrote:
> >
> > On Thu, 20 Feb 2025, 21:39 Phil Bouchard, <boost_at_[hidden]> wrote:
> >
> > On 2/20/25 16:19, Jonathan Wakely wrote:
> > >
> > >
> > > On Thu, 20 Feb 2025 at 20:41, Phil Bouchard via Std-Proposals
> > > <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]>>
> > > wrote:
> > >
> > > regex_match would get 1 more character on the need basis using in.get()
> > > quite simply. If it fails then it would rewind the read pointer to
> > > where
> > > it was.
> > >
> > >
> > > How? iostream putback is extremely limited.
> >
> > Using seekg().
> >
> > That might work on an ifstream or istringstream but not on an arbitrary istream.
>
> It is as necessary to have a buffer of an arbitrarily large size for regexes, as the underlying theory for regular expressions just tells whether a string is in the language or not, and cannot tell when to stop. Examples are expressions like a|a*b, where on a string a… followed by something else than b, all but the first ‘a’ must be put back into the buffer. (Or some similar idea.)
>
> So when starting with parsers, the one character put back rule is no longer useful:
>
> One other example is when reading UTF-32 characters from a UTF-8 stream: Then one in general cannot put back the UTF-32 character, as it will in general occupy mora than one byte.
>
>
> Which is why you don't want to build it on top of a single-pass range, like an istream.
One can have a single-pass buffered input stream, only that the buffer is larger than one character, which is what Flex does. Or like the other C++ formatted input already present, for different types, int's, floats, and strings, where you can't put back characters.
Received on 2025-02-21 10:09:23