C++ Logo

std-proposals

Advanced search

Re: Avoid copies when using string_view with APIs expecting null terminated strings

From: Tom Mason <wheybags_at_[hidden]>
Date: Sat, 26 Dec 2020 01:25:06 +0000
On further reading of the GCC docs (yes, on christmas day, but everyone else has gone to bed already :p), it seems libstdc++ does have a mechanism similar to that of libc++. The -fabi-version flag (docs here https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html) can be used to specify an abi version, and modern versions of gcc defaults to 0 (latest). So, the new ABI version could implement the termination flag, and the old ones could just default to not terminated. I'm sure there are gcc contributers on here, so sorry if this was all obvious, I just thought it was worth mentioning for clarity.
I will probably do up a new, more proper writeup and example implementation in the next few days. As I'm new here, can anyone with experience comment on the likelihood of acceptance of this proposal? From my (optimistic) viewpoint, the main blocker would be ABI issues, as opposed to objections to the feature itself, and I think the workarounds are acceptable.

On Wed, Dec 23, 2020, at 11:57 PM, Tom Mason via Std-Proposals wrote:
> I was actually just writing a message to propose a halfway solution
> like that. I had a look at what I could find about the ABI policies of
> the big three c++ stdlib implementations. For reference, they are here:
> https://releases.llvm.org/6.0.0/projects/libcxx/docs/DesignDocs/ABIVersioning.html https://releases.llvm.org/6.0.0/projects/libcxx/docs/DesignDocs/ABIVersioning.html and https://docs.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=msvc-160.
>
> Of those, libc++ would be the easiest. It can be added immediately, and
> protected behind an ABI version macro. With MSVC, it seems they used to
> break ABI a lot more frequently, but for the last 3 versions they have
> kept it stable. If they go back to more breakage, it could be done
> then, or, if I understand correctly, it could be enabled when the /GL
> flag is used, as in that case ABI version compatability is broken
> anyway. libstdc++ would be the hardest, as they have been unbroken for
> a long long time, and I don't see any kind of version flags like libc++
> has.
>
> On Wed, Dec 23, 2020, at 11:00 PM, Tony V E wrote:
> > What you really want is to get a null-terminated char const * from a
> > string_view, and not pay the extra cost when the null was already
> > there. Copy, but only when necessary.
> >
> > And we want this because there are lots of pre-existing functions that
> > take null-terminated char const * strings.
> >
> > void f(char const *);
> >
> > string_view sv =...
> >
> > f(sv.???);
> >
> > First problem is that this is a tricky API. How do you keep it from
> > being multiple steps?
> >
> > - you need to return an object that owns the string if it was copied,
> > but could also "borrow" the existing string.
> >
> > Let's call it mstring_view. The m is for maybe owning. But it could be
> > an implementation-defined type. Maybe for some implementations it could
> > be std::string even.
> >
> > f(sv.m_str().c_str());
> >
> > Now it is implementation defined how string_view turns into
> > mstring_view, but hopefully a quality implementation can track the
> > "possibly there's a null" of the string_view, similar to Arthur's
> > suggestions.‎ For some implementations, it will always copy. Or will
> > always copy until they can do a ABI break. (If there are no
> > implementations that think they can take advantage of this, then it may
> > not be worth doing.)
> >
> > Note that, worse case, it will never be worse than manually copying the
> > string_view.
> >
> > This doesn't mean we couldn't also have zstring_view, etc. It only
> > tackles the "I have a string_view and now need a null-terminated char
> > const *" problem.
> >
> > The other solution is to (slowly) convince OSES, etc, to natively
> > support ptr + length APIs.
> >
> > Sent from my BlackBerry portable Babbage Device
> > Original Message
> > From: Tom Mason via Std-Proposals
> > Sent: Wednesday, December 23, 2020 5:14 PM
> > To: Tom Mason via Std-Proposals
> > Reply To: std-proposals_at_[hidden]
> > Cc: Tom Mason
> > Subject: Re: [std-proposals] Avoid copies when using string_view with
> > APIs expecting null terminated strings
> >
> > I don't think what Arthur is proposing would involve the string_view
> > modifying the const char*. It would just be informed that it can check
> > the value at data() + size(), so it only does a read at that location
> > to compare to zero. The byte at that position would have to be set
> > externally. It does seem a little obscure though. Why would you have a
> > string buffer with embedded nulls in it, make a string view that passes
> > over the null, and then trim down to it? Maybe there is a use case
> > there that I can't think of, but it seems better to me to just store a
> > flag that indicates null termination directly.
> >
> > On Wed, Dec 23, 2020, at 10:00 PM, Nikolay Mihaylov via Std-Proposals wrote:
> > > Sorry for comment too late, but how exactly string_view will add null
> > > at the end, if it is const? The const char * could be in read only
> > > memory as well.
> > >
> > > The only way this to work is to copy the char array - into std::string
> > > or into char array (i usually use std::array<char>.
> > >
> > > Another way is to change your algorithm, so it does not rely on null terminator.
> > >
> > > Nikolay
> > >
> > > On Wednesday, December 23, 2020, Thiago Macieira via Std-Proposals
> > > <std-proposals_at_[hidden]> wrote:
> > > > On Wednesday, 23 December 2020 13:48:51 -03 Arthur O'Dwyer via Std-Proposals
> > > > wrote:
> > > > > Nobody's mentioned it yet for some reason, but this is far from the first
> > > > > time that the idea of "null-terminated string_view" has come up. The
> > > > > traditional name for it is "zstring_view", and you can find more
> > > > > information by googling that name.
> > > >
> > > > I've seen both zstring_view and cstring_view, the difference between them is
> > > > that the former usually carries an explicit size whereas the latter does not.
> > > > Yes, it's duplicated information, but the size is a very common information
> > > > that is needed. That also makes it layout-compatible with string_view, whereas
> > > > cstring_view is layout-compatible with a plain pointer.
> > > >
> > > > > An interesting runtime variation is bev::string_view —
> > > > > https://github.com/lava/string_view
> > > > > bev::string_view remembers which constructor was used to construct it — the
> > > > > pointer-length constructor, or the std::string/const char* constructor? —
> > > > > and therefore remembers whether it's safe to access the char at
> > > > > this.data()[this.size()]. This means it knows whether it's safe to test
> > > > > this.data()[this.size()] against '\0', which means it can frequently detect
> > > > > its own null-terminated-ness at runtime. (But if it was constructed with
> > > > > the pointer-length constructor, and never .remove_suffix'ed, then it would
> > > > > conservatively report that it didn't know itself to be null-terminated and
> > > > > couldn't find out without UB.)
> > > >
> > > > It can't report, period. Even if it could determine that the NUL is there
> > > > after the string data, it can't guarantee that it will *remain* there. Since
> > > > that byte may belong to something, it may therefore be asynchronously
> > > > modified.
> > > >
> > > > The only time when it's legal to access that NUL is when doing strlen() in the
> > > > constructor. And even then, this class as you described must have as part of
> > > > its contract that the size()+1 must remain unchanged while the object lives.
> > > >
> > > > > Microsoft GSL used to provide (or at least, used to document) a
> > > > > `zstring_span` with zstring_view semantics. However, these days, GSL
> > > > > provides `czstring` only as a type alias for `const char*`, and has gotten
> > > > > rid of its pre-C++17 `string_span` and `zstring_span` types.
> > > >
> > > > Ok, that's new to me. cstring, zstring, czstring... since it's MS, I suppose
> > > > they also had lpszwstring? :-)
> > > > --
> > > > Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> > > > Software Architect - Intel DPG Cloud Engineering
> > > >
> > > >
> > > >
> > > > --
> > > > Std-Proposals mailing list
> > > > Std-Proposals_at_[hidden]
> > > > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> > > --
> > > Std-Proposals mailing list
> > > Std-Proposals_at_[hidden]
> > > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> > >
> > --
> > Std-Proposals mailing list
> > Std-Proposals_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2020-12-25 19:25:35