Date: Fri, 21 Feb 2025 11:38:13 +0000
On Fri, 21 Feb 2025 at 11:25, Hans Åberg <haberg_1_at_[hidden]> wrote:
>
> > On 21 Feb 2025, at 11:34, Jonathan Wakely <cxx_at_[hidden]> wrote:
> >
> >> On Fri, 21 Feb 2025 at 10:09, Hans Åberg <haberg_1_at_[hidden]> wrote:
> >>
> >> > On 21 Feb 2025, at 10:53, Jonathan Wakely <cxx_at_[hidden]> wrote:
> >> >
> >> > On Fri, 21 Feb 2025 at 09:08, Hans Åberg <haberg_1_at_[hidden]> wrote:
> >> >
> >> > > On 21 Feb 2025, at 00:51, Jonathan Wakely via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >> > >
> >> > > On Thu, 20 Feb 2025, 21:39 Phil Bouchard, <boost_at_[hidden]> wrote:
> >> > >
> >> > > On 2/20/25 16:19, Jonathan Wakely wrote:
> >> > > >
> >> > > >
> >> > > > On Thu, 20 Feb 2025 at 20:41, Phil Bouchard via Std-Proposals
> >> > > > <std-proposals_at_[hidden] <mailto:
> std-proposals_at_[hidden]>>
> >> > > > wrote:
> >> > > >
> >> > > > regex_match would get 1 more character on the need basis
> using in.get()
> >> > > > quite simply. If it fails then it would rewind the read
> pointer to
> >> > > > where
> >> > > > it was.
> >> > > >
> >> > > >
> >> > > > How? iostream putback is extremely limited.
> >> > >
> >> > > Using seekg().
> >> > >
> >> > > That might work on an ifstream or istringstream but not on an
> arbitrary istream.
> >> >
> >> > It is as necessary to have a buffer of an arbitrarily large size for
> regexes, as the underlying theory for regular expressions just tells
> whether a string is in the language or not, and cannot tell when to stop.
> Examples are expressions like a|a*b, where on a string a… followed by
> something else than b, all but the first ‘a’ must be put back into the
> buffer. (Or some similar idea.)
> >> >
> >> > So when starting with parsers, the one character put back rule is no
> longer useful:
> >> >
> >> > One other example is when reading UTF-32 characters from a UTF-8
> stream: Then one in general cannot put back the UTF-32 character, as it
> will in general occupy mora than one byte.
> >> >
> >> >
> >> > Which is why you don't want to build it on top of a single-pass
> range, like an istream.
> >>
> >> One can have a single-pass buffered input stream, only that the buffer
> is larger than one character, which is what Flex does. Or like the other
> C++ formatted input already present, for different types, int's, floats,
> and strings, where you can't put back characters.
> >
> > But if you require a specific kind of buffered input stream (or even
> just a specific kind of streambuf with an arbitrary size putback area) then
> it's not a generic std::istream and so you don't want a operator>> overload
> that works with arbitrary std::istream objects.
>
> The std::istream objects might be extended to have arbitrary size buffers
> with only the first character in the synchronized C stream, and if putting
> back more than one character, the synchronization is broken. It will
> simplify the implementation of regexes, and also formatted input can be put
> back.
>
> They are indeed in effect a new type of istream objects on top of those
> are now.
So ... not istreams.
Surely what we want is a regex that can match an input_range, which could
then be used with your new "buffered input stream" thing, and other ranges
defined by input iterators. Rewinding/resetting the range on failed matches
should be a separate behaviour that is optionally supported by whatever
range you're using, not related to the regex matching.
>
> > On 21 Feb 2025, at 11:34, Jonathan Wakely <cxx_at_[hidden]> wrote:
> >
> >> On Fri, 21 Feb 2025 at 10:09, Hans Åberg <haberg_1_at_[hidden]> wrote:
> >>
> >> > On 21 Feb 2025, at 10:53, Jonathan Wakely <cxx_at_[hidden]> wrote:
> >> >
> >> > On Fri, 21 Feb 2025 at 09:08, Hans Åberg <haberg_1_at_[hidden]> wrote:
> >> >
> >> > > On 21 Feb 2025, at 00:51, Jonathan Wakely via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >> > >
> >> > > On Thu, 20 Feb 2025, 21:39 Phil Bouchard, <boost_at_[hidden]> wrote:
> >> > >
> >> > > On 2/20/25 16:19, Jonathan Wakely wrote:
> >> > > >
> >> > > >
> >> > > > On Thu, 20 Feb 2025 at 20:41, Phil Bouchard via Std-Proposals
> >> > > > <std-proposals_at_[hidden] <mailto:
> std-proposals_at_[hidden]>>
> >> > > > wrote:
> >> > > >
> >> > > > regex_match would get 1 more character on the need basis
> using in.get()
> >> > > > quite simply. If it fails then it would rewind the read
> pointer to
> >> > > > where
> >> > > > it was.
> >> > > >
> >> > > >
> >> > > > How? iostream putback is extremely limited.
> >> > >
> >> > > Using seekg().
> >> > >
> >> > > That might work on an ifstream or istringstream but not on an
> arbitrary istream.
> >> >
> >> > It is as necessary to have a buffer of an arbitrarily large size for
> regexes, as the underlying theory for regular expressions just tells
> whether a string is in the language or not, and cannot tell when to stop.
> Examples are expressions like a|a*b, where on a string a… followed by
> something else than b, all but the first ‘a’ must be put back into the
> buffer. (Or some similar idea.)
> >> >
> >> > So when starting with parsers, the one character put back rule is no
> longer useful:
> >> >
> >> > One other example is when reading UTF-32 characters from a UTF-8
> stream: Then one in general cannot put back the UTF-32 character, as it
> will in general occupy mora than one byte.
> >> >
> >> >
> >> > Which is why you don't want to build it on top of a single-pass
> range, like an istream.
> >>
> >> One can have a single-pass buffered input stream, only that the buffer
> is larger than one character, which is what Flex does. Or like the other
> C++ formatted input already present, for different types, int's, floats,
> and strings, where you can't put back characters.
> >
> > But if you require a specific kind of buffered input stream (or even
> just a specific kind of streambuf with an arbitrary size putback area) then
> it's not a generic std::istream and so you don't want a operator>> overload
> that works with arbitrary std::istream objects.
>
> The std::istream objects might be extended to have arbitrary size buffers
> with only the first character in the synchronized C stream, and if putting
> back more than one character, the synchronization is broken. It will
> simplify the implementation of regexes, and also formatted input can be put
> back.
>
> They are indeed in effect a new type of istream objects on top of those
> are now.
So ... not istreams.
Surely what we want is a regex that can match an input_range, which could
then be used with your new "buffered input stream" thing, and other ranges
defined by input iterators. Rewinding/resetting the range on failed matches
should be a separate behaviour that is optionally supported by whatever
range you're using, not related to the regex matching.
Received on 2025-02-21 11:38:29