C++ Logo

std-proposals

Advanced search

Re: Avoid copies when using string_view with APIs expecting null terminated strings

From: Tom Mason <wheybags_at_[hidden]>
Date: Wed, 23 Dec 2020 23:57:31 +0000
I was actually just writing a message to propose a halfway solution like that. I had a look at what I could find about the ABI policies of the big three c++ stdlib implementations. For reference, they are here: https://releases.llvm.org/6.0.0/projects/libcxx/docs/DesignDocs/ABIVersioning.html https://releases.llvm.org/6.0.0/projects/libcxx/docs/DesignDocs/ABIVersioning.html and https://docs.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=msvc-160.

Of those, libc++ would be the easiest. It can be added immediately, and protected behind an ABI version macro. With MSVC, it seems they used to break ABI a lot more frequently, but for the last 3 versions they have kept it stable. If they go back to more breakage, it could be done then, or, if I understand correctly, it could be enabled when the /GL flag is used, as in that case ABI version compatability is broken anyway. libstdc++ would be the hardest, as they have been unbroken for a long long time, and I don't see any kind of version flags like libc++ has.

On Wed, Dec 23, 2020, at 11:00 PM, Tony V E wrote:
> What you really want is to get a null-terminated char const * from a
> string_view, and not pay the extra cost when the null was already
> there. Copy, but only when necessary.
>
> And we want this because there are lots of pre-existing functions that
> take null-terminated char const * strings.
>
> void f(char const *);
>
> string_view sv =...
>
> f(sv.???);
>
> First problem is that this is a tricky API. How do you keep it from
> being multiple steps?
>
> - you need to return an object that owns the string if it was copied,
> but could also "borrow" the existing string.
>
> Let's call it mstring_view. The m is for maybe owning. But it could be
> an implementation-defined type. Maybe for some implementations it could
> be std::string even.
>
> f(sv.m_str().c_str());
>
> Now it is implementation defined how string_view turns into
> mstring_view, but hopefully a quality implementation can track the
> "possibly there's a null" of the string_view, similar to Arthur's
> suggestions.‎ For some implementations, it will always copy. Or will
> always copy until they can do a ABI break. (If there are no
> implementations that think they can take advantage of this, then it may
> not be worth doing.)
>
> Note that, worse case, it will never be worse than manually copying the
> string_view.
>
> This doesn't mean we couldn't also have zstring_view, etc. It only
> tackles the "I have a string_view and now need a null-terminated char
> const *" problem.
>
> The other solution is to (slowly) convince OSES, etc, to natively
> support ptr + length APIs.
>
> Sent from my BlackBerry portable Babbage Device
> Original Message
> From: Tom Mason via Std-Proposals
> Sent: Wednesday, December 23, 2020 5:14 PM
> To: Tom Mason via Std-Proposals
> Reply To: std-proposals_at_[hidden]
> Cc: Tom Mason
> Subject: Re: [std-proposals] Avoid copies when using string_view with
> APIs expecting null terminated strings
>
> I don't think what Arthur is proposing would involve the string_view
> modifying the const char*. It would just be informed that it can check
> the value at data() + size(), so it only does a read at that location
> to compare to zero. The byte at that position would have to be set
> externally. It does seem a little obscure though. Why would you have a
> string buffer with embedded nulls in it, make a string view that passes
> over the null, and then trim down to it? Maybe there is a use case
> there that I can't think of, but it seems better to me to just store a
> flag that indicates null termination directly.
>
> On Wed, Dec 23, 2020, at 10:00 PM, Nikolay Mihaylov via Std-Proposals wrote:
> > Sorry for comment too late, but how exactly string_view will add null
> > at the end, if it is const? The const char * could be in read only
> > memory as well.
> >
> > The only way this to work is to copy the char array - into std::string
> > or into char array (i usually use std::array<char>.
> >
> > Another way is to change your algorithm, so it does not rely on null terminator.
> >
> > Nikolay
> >
> > On Wednesday, December 23, 2020, Thiago Macieira via Std-Proposals
> > <std-proposals_at_[hidden]> wrote:
> > > On Wednesday, 23 December 2020 13:48:51 -03 Arthur O'Dwyer via Std-Proposals
> > > wrote:
> > > > Nobody's mentioned it yet for some reason, but this is far from the first
> > > > time that the idea of "null-terminated string_view" has come up. The
> > > > traditional name for it is "zstring_view", and you can find more
> > > > information by googling that name.
> > >
> > > I've seen both zstring_view and cstring_view, the difference between them is
> > > that the former usually carries an explicit size whereas the latter does not.
> > > Yes, it's duplicated information, but the size is a very common information
> > > that is needed. That also makes it layout-compatible with string_view, whereas
> > > cstring_view is layout-compatible with a plain pointer.
> > >
> > > > An interesting runtime variation is bev::string_view —
> > > > https://github.com/lava/string_view
> > > > bev::string_view remembers which constructor was used to construct it — the
> > > > pointer-length constructor, or the std::string/const char* constructor? —
> > > > and therefore remembers whether it's safe to access the char at
> > > > this.data()[this.size()]. This means it knows whether it's safe to test
> > > > this.data()[this.size()] against '\0', which means it can frequently detect
> > > > its own null-terminated-ness at runtime. (But if it was constructed with
> > > > the pointer-length constructor, and never .remove_suffix'ed, then it would
> > > > conservatively report that it didn't know itself to be null-terminated and
> > > > couldn't find out without UB.)
> > >
> > > It can't report, period. Even if it could determine that the NUL is there
> > > after the string data, it can't guarantee that it will *remain* there. Since
> > > that byte may belong to something, it may therefore be asynchronously
> > > modified.
> > >
> > > The only time when it's legal to access that NUL is when doing strlen() in the
> > > constructor. And even then, this class as you described must have as part of
> > > its contract that the size()+1 must remain unchanged while the object lives.
> > >
> > > > Microsoft GSL used to provide (or at least, used to document) a
> > > > `zstring_span` with zstring_view semantics. However, these days, GSL
> > > > provides `czstring` only as a type alias for `const char*`, and has gotten
> > > > rid of its pre-C++17 `string_span` and `zstring_span` types.
> > >
> > > Ok, that's new to me. cstring, zstring, czstring... since it's MS, I suppose
> > > they also had lpszwstring? :-)
> > > --
> > > Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> > > Software Architect - Intel DPG Cloud Engineering
> > >
> > >
> > >
> > > --
> > > Std-Proposals mailing list
> > > Std-Proposals_at_[hidden]
> > > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> > --
> > Std-Proposals mailing list
> > Std-Proposals_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2020-12-23 17:59:00