Date: Fri, 21 Feb 2025 12:24:56 +0100
> On 21 Feb 2025, at 11:34, Jonathan Wakely <cxx_at_[hidden]> wrote:
>
>> On Fri, 21 Feb 2025 at 10:09, Hans Åberg <haberg_1_at_[hidden]> wrote:
>>
>> > On 21 Feb 2025, at 10:53, Jonathan Wakely <cxx_at_[hidden]> wrote:
>> >
>> > On Fri, 21 Feb 2025 at 09:08, Hans Åberg <haberg_1_at_[hidden]> wrote:
>> >
>> > > On 21 Feb 2025, at 00:51, Jonathan Wakely via Std-Proposals <std-proposals_at_[hidden]> wrote:
>> > >
>> > > On Thu, 20 Feb 2025, 21:39 Phil Bouchard, <boost_at_[hidden]> wrote:
>> > >
>> > > On 2/20/25 16:19, Jonathan Wakely wrote:
>> > > >
>> > > >
>> > > > On Thu, 20 Feb 2025 at 20:41, Phil Bouchard via Std-Proposals
>> > > > <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]>>
>> > > > wrote:
>> > > >
>> > > > regex_match would get 1 more character on the need basis using in.get()
>> > > > quite simply. If it fails then it would rewind the read pointer to
>> > > > where
>> > > > it was.
>> > > >
>> > > >
>> > > > How? iostream putback is extremely limited.
>> > >
>> > > Using seekg().
>> > >
>> > > That might work on an ifstream or istringstream but not on an arbitrary istream.
>> >
>> > It is as necessary to have a buffer of an arbitrarily large size for regexes, as the underlying theory for regular expressions just tells whether a string is in the language or not, and cannot tell when to stop. Examples are expressions like a|a*b, where on a string a… followed by something else than b, all but the first ‘a’ must be put back into the buffer. (Or some similar idea.)
>> >
>> > So when starting with parsers, the one character put back rule is no longer useful:
>> >
>> > One other example is when reading UTF-32 characters from a UTF-8 stream: Then one in general cannot put back the UTF-32 character, as it will in general occupy mora than one byte.
>> >
>> >
>> > Which is why you don't want to build it on top of a single-pass range, like an istream.
>>
>> One can have a single-pass buffered input stream, only that the buffer is larger than one character, which is what Flex does. Or like the other C++ formatted input already present, for different types, int's, floats, and strings, where you can't put back characters.
>
> But if you require a specific kind of buffered input stream (or even just a specific kind of streambuf with an arbitrary size putback area) then it's not a generic std::istream and so you don't want a operator>> overload that works with arbitrary std::istream objects.
The std::istream objects might be extended to have arbitrary size buffers with only the first character in the synchronized C stream, and if putting back more than one character, the synchronization is broken. It will simplify the implementation of regexes, and also formatted input can be put back.
They are indeed in effect a new type of istream objects on top of those are now.
>
>> On Fri, 21 Feb 2025 at 10:09, Hans Åberg <haberg_1_at_[hidden]> wrote:
>>
>> > On 21 Feb 2025, at 10:53, Jonathan Wakely <cxx_at_[hidden]> wrote:
>> >
>> > On Fri, 21 Feb 2025 at 09:08, Hans Åberg <haberg_1_at_[hidden]> wrote:
>> >
>> > > On 21 Feb 2025, at 00:51, Jonathan Wakely via Std-Proposals <std-proposals_at_[hidden]> wrote:
>> > >
>> > > On Thu, 20 Feb 2025, 21:39 Phil Bouchard, <boost_at_[hidden]> wrote:
>> > >
>> > > On 2/20/25 16:19, Jonathan Wakely wrote:
>> > > >
>> > > >
>> > > > On Thu, 20 Feb 2025 at 20:41, Phil Bouchard via Std-Proposals
>> > > > <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]>>
>> > > > wrote:
>> > > >
>> > > > regex_match would get 1 more character on the need basis using in.get()
>> > > > quite simply. If it fails then it would rewind the read pointer to
>> > > > where
>> > > > it was.
>> > > >
>> > > >
>> > > > How? iostream putback is extremely limited.
>> > >
>> > > Using seekg().
>> > >
>> > > That might work on an ifstream or istringstream but not on an arbitrary istream.
>> >
>> > It is as necessary to have a buffer of an arbitrarily large size for regexes, as the underlying theory for regular expressions just tells whether a string is in the language or not, and cannot tell when to stop. Examples are expressions like a|a*b, where on a string a… followed by something else than b, all but the first ‘a’ must be put back into the buffer. (Or some similar idea.)
>> >
>> > So when starting with parsers, the one character put back rule is no longer useful:
>> >
>> > One other example is when reading UTF-32 characters from a UTF-8 stream: Then one in general cannot put back the UTF-32 character, as it will in general occupy mora than one byte.
>> >
>> >
>> > Which is why you don't want to build it on top of a single-pass range, like an istream.
>>
>> One can have a single-pass buffered input stream, only that the buffer is larger than one character, which is what Flex does. Or like the other C++ formatted input already present, for different types, int's, floats, and strings, where you can't put back characters.
>
> But if you require a specific kind of buffered input stream (or even just a specific kind of streambuf with an arbitrary size putback area) then it's not a generic std::istream and so you don't want a operator>> overload that works with arbitrary std::istream objects.
The std::istream objects might be extended to have arbitrary size buffers with only the first character in the synchronized C stream, and if putting back more than one character, the synchronization is broken. It will simplify the implementation of regexes, and also formatted input can be put back.
They are indeed in effect a new type of istream objects on top of those are now.
Received on 2025-02-21 11:25:16