Date: Wed, 23 Dec 2020 18:00:52 -0500
What you really want is to get a null-terminated char const * from a string_view, and not pay the extra cost when the null was already there. Copy, but only when necessary.
And we want this because there are lots of pre-existing functions that take null-terminated char const * strings.
void f(char const *);
string_view sv =...
f(sv.???);
First problem is that this is a tricky API. How do you keep it from being multiple steps?
- you need to return an object that owns the string if it was copied, but could also "borrow" the existing string.
Let's call it mstring_view. The m is for maybe owning. But it could be an implementation-defined type. Maybe for some implementations it could be std::string even.
f(sv.m_str().c_str());
Now it is implementation defined how string_view turns into mstring_view, but hopefully a quality implementation can track the "possibly there's a null" of the string_view, similar to Arthur's suggestions. For some implementations, it will always copy. Or will always copy until they can do a ABI break. (If there are no implementations that think they can take advantage of this, then it may not be worth doing.)
Note that, worse case, it will never be worse than manually copying the string_view.
This doesn't mean we couldn't also have zstring_view, etc. It only tackles the "I have a string_view and now need a null-terminated char const *" problem.
The other solution is to (slowly) convince OSES, etc, to natively support ptr + length APIs.
Sent from my BlackBerry portable Babbage Device
Original Message
From: Tom Mason via Std-Proposals
Sent: Wednesday, December 23, 2020 5:14 PM
To: Tom Mason via Std-Proposals
Reply To: std-proposals_at_[hidden]
Cc: Tom Mason
Subject: Re: [std-proposals] Avoid copies when using string_view with APIs expecting null terminated strings
I don't think what Arthur is proposing would involve the string_view modifying the const char*. It would just be informed that it can check the value at data() + size(), so it only does a read at that location to compare to zero. The byte at that position would have to be set externally. It does seem a little obscure though. Why would you have a string buffer with embedded nulls in it, make a string view that passes over the null, and then trim down to it? Maybe there is a use case there that I can't think of, but it seems better to me to just store a flag that indicates null termination directly.
On Wed, Dec 23, 2020, at 10:00 PM, Nikolay Mihaylov via Std-Proposals wrote:
> Sorry for comment too late, but how exactly string_view will add null
> at the end, if it is const? The const char * could be in read only
> memory as well.
>
> The only way this to work is to copy the char array - into std::string
> or into char array (i usually use std::array<char>.
>
> Another way is to change your algorithm, so it does not rely on null terminator.
>
> Nikolay
>
> On Wednesday, December 23, 2020, Thiago Macieira via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
> > On Wednesday, 23 December 2020 13:48:51 -03 Arthur O'Dwyer via Std-Proposals
> > wrote:
> > > Nobody's mentioned it yet for some reason, but this is far from the first
> > > time that the idea of "null-terminated string_view" has come up. The
> > > traditional name for it is "zstring_view", and you can find more
> > > information by googling that name.
> >
> > I've seen both zstring_view and cstring_view, the difference between them is
> > that the former usually carries an explicit size whereas the latter does not.
> > Yes, it's duplicated information, but the size is a very common information
> > that is needed. That also makes it layout-compatible with string_view, whereas
> > cstring_view is layout-compatible with a plain pointer.
> >
> > > An interesting runtime variation is bev::string_view —
> > > https://github.com/lava/string_view
> > > bev::string_view remembers which constructor was used to construct it — the
> > > pointer-length constructor, or the std::string/const char* constructor? —
> > > and therefore remembers whether it's safe to access the char at
> > > this.data()[this.size()]. This means it knows whether it's safe to test
> > > this.data()[this.size()] against '\0', which means it can frequently detect
> > > its own null-terminated-ness at runtime. (But if it was constructed with
> > > the pointer-length constructor, and never .remove_suffix'ed, then it would
> > > conservatively report that it didn't know itself to be null-terminated and
> > > couldn't find out without UB.)
> >
> > It can't report, period. Even if it could determine that the NUL is there
> > after the string data, it can't guarantee that it will *remain* there. Since
> > that byte may belong to something, it may therefore be asynchronously
> > modified.
> >
> > The only time when it's legal to access that NUL is when doing strlen() in the
> > constructor. And even then, this class as you described must have as part of
> > its contract that the size()+1 must remain unchanged while the object lives.
> >
> > > Microsoft GSL used to provide (or at least, used to document) a
> > > `zstring_span` with zstring_view semantics. However, these days, GSL
> > > provides `czstring` only as a type alias for `const char*`, and has gotten
> > > rid of its pre-C++17 `string_span` and `zstring_span` types.
> >
> > Ok, that's new to me. cstring, zstring, czstring... since it's MS, I suppose
> > they also had lpszwstring? :-)
> > --
> > Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> > Software Architect - Intel DPG Cloud Engineering
> >
> >
> >
> > --
> > Std-Proposals mailing list
> > Std-Proposals_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
And we want this because there are lots of pre-existing functions that take null-terminated char const * strings.
void f(char const *);
string_view sv =...
f(sv.???);
First problem is that this is a tricky API. How do you keep it from being multiple steps?
- you need to return an object that owns the string if it was copied, but could also "borrow" the existing string.
Let's call it mstring_view. The m is for maybe owning. But it could be an implementation-defined type. Maybe for some implementations it could be std::string even.
f(sv.m_str().c_str());
Now it is implementation defined how string_view turns into mstring_view, but hopefully a quality implementation can track the "possibly there's a null" of the string_view, similar to Arthur's suggestions. For some implementations, it will always copy. Or will always copy until they can do a ABI break. (If there are no implementations that think they can take advantage of this, then it may not be worth doing.)
Note that, worse case, it will never be worse than manually copying the string_view.
This doesn't mean we couldn't also have zstring_view, etc. It only tackles the "I have a string_view and now need a null-terminated char const *" problem.
The other solution is to (slowly) convince OSES, etc, to natively support ptr + length APIs.
Sent from my BlackBerry portable Babbage Device
Original Message
From: Tom Mason via Std-Proposals
Sent: Wednesday, December 23, 2020 5:14 PM
To: Tom Mason via Std-Proposals
Reply To: std-proposals_at_[hidden]
Cc: Tom Mason
Subject: Re: [std-proposals] Avoid copies when using string_view with APIs expecting null terminated strings
I don't think what Arthur is proposing would involve the string_view modifying the const char*. It would just be informed that it can check the value at data() + size(), so it only does a read at that location to compare to zero. The byte at that position would have to be set externally. It does seem a little obscure though. Why would you have a string buffer with embedded nulls in it, make a string view that passes over the null, and then trim down to it? Maybe there is a use case there that I can't think of, but it seems better to me to just store a flag that indicates null termination directly.
On Wed, Dec 23, 2020, at 10:00 PM, Nikolay Mihaylov via Std-Proposals wrote:
> Sorry for comment too late, but how exactly string_view will add null
> at the end, if it is const? The const char * could be in read only
> memory as well.
>
> The only way this to work is to copy the char array - into std::string
> or into char array (i usually use std::array<char>.
>
> Another way is to change your algorithm, so it does not rely on null terminator.
>
> Nikolay
>
> On Wednesday, December 23, 2020, Thiago Macieira via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
> > On Wednesday, 23 December 2020 13:48:51 -03 Arthur O'Dwyer via Std-Proposals
> > wrote:
> > > Nobody's mentioned it yet for some reason, but this is far from the first
> > > time that the idea of "null-terminated string_view" has come up. The
> > > traditional name for it is "zstring_view", and you can find more
> > > information by googling that name.
> >
> > I've seen both zstring_view and cstring_view, the difference between them is
> > that the former usually carries an explicit size whereas the latter does not.
> > Yes, it's duplicated information, but the size is a very common information
> > that is needed. That also makes it layout-compatible with string_view, whereas
> > cstring_view is layout-compatible with a plain pointer.
> >
> > > An interesting runtime variation is bev::string_view —
> > > https://github.com/lava/string_view
> > > bev::string_view remembers which constructor was used to construct it — the
> > > pointer-length constructor, or the std::string/const char* constructor? —
> > > and therefore remembers whether it's safe to access the char at
> > > this.data()[this.size()]. This means it knows whether it's safe to test
> > > this.data()[this.size()] against '\0', which means it can frequently detect
> > > its own null-terminated-ness at runtime. (But if it was constructed with
> > > the pointer-length constructor, and never .remove_suffix'ed, then it would
> > > conservatively report that it didn't know itself to be null-terminated and
> > > couldn't find out without UB.)
> >
> > It can't report, period. Even if it could determine that the NUL is there
> > after the string data, it can't guarantee that it will *remain* there. Since
> > that byte may belong to something, it may therefore be asynchronously
> > modified.
> >
> > The only time when it's legal to access that NUL is when doing strlen() in the
> > constructor. And even then, this class as you described must have as part of
> > its contract that the size()+1 must remain unchanged while the object lives.
> >
> > > Microsoft GSL used to provide (or at least, used to document) a
> > > `zstring_span` with zstring_view semantics. However, these days, GSL
> > > provides `czstring` only as a type alias for `const char*`, and has gotten
> > > rid of its pre-C++17 `string_span` and `zstring_span` types.
> >
> > Ok, that's new to me. cstring, zstring, czstring... since it's MS, I suppose
> > they also had lpszwstring? :-)
> > --
> > Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> > Software Architect - Intel DPG Cloud Engineering
> >
> >
> >
> > --
> > Std-Proposals mailing list
> > Std-Proposals_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
-- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
Received on 2020-12-23 17:00:56