C++ Logo


Advanced search

Re: Avoid copies when using string_view with APIs expecting null terminated strings

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Wed, 23 Dec 2020 16:55:26 -0500
On Wed, Dec 23, 2020, 2:27 PM Thiago Macieira via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> On Wednesday, 23 December 2020 13:48:51 -03 Arthur O'Dwyer via
> Std-Proposals
> wrote:
> > An interesting runtime variation is bev::string_view —
> > https://github.com/lava/string_view
> > bev::string_view remembers which constructor was used to construct it —
> the
> > pointer-length constructor, or the std::string/const char* constructor? —
> > and therefore remembers whether it's safe to access the char at
> > this.data()[this.size()]. This means it knows whether it's safe to test
> > this.data()[this.size()] against '\0', which means it can frequently
> detect
> > its own null-terminated-ness at runtime. (But if it was constructed with
> > the pointer-length constructor, and never .remove_suffix'ed, then it
> would
> > conservatively report that it didn't know itself to be null-terminated
> and
> > couldn't find out without UB.)
> It can't report, period. Even if it could determine that the NUL is there
> after the string data, it can't guarantee that it will *remain* there.

Yes, formally speaking, bev::string_view adds one more byte's worth of
preconditions onto the preconditions of std::string_view.
There's a subtlety here: It can't be assured that the NUL byte remains
there, but it *can* be assured that the memory remains there, so it can
check for null-termination later. Sure you could fool it (formally
speaking) via something like

std::vector<char> trick = {'a', 'b', 0};
bev::string_view sv = trick.data();
if (sv.is_cstring()) ... // ha ha, accessed into a destroyed 'char' object!

but in practice this is exactly as easy to guard against as

    std::vector<char> trick = {'a', 'b'};
    bev::string_view sv(trick.data(), trick.size());
    std::cout << sv; // ha ha, accessed into a deallocated memory buffer!

and it's much *less* likely to arise by accident.

Since that byte may belong to something, it may therefore be asynchronously
> modified.

If you're talking about data races due to modification from some other
thread: I daresay that never happens in practice. People don't use
string_view that way. But sure, formally, you'd also want to document that
sv.is_cstring() internally does a "read" that could race with "writes" to
the same location on other threads.


Received on 2020-12-23 15:55:40