C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Appending a string to itself

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Fri, 21 Jul 2023 21:02:24 +0100
On Fri, 21 Jul 2023, 20:41 Ville Voutilainen, <ville.voutilainen_at_[hidden]>
wrote:

> On Fri, 21 Jul 2023 at 22:36, Jonathan Wakely via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
> >
> >
> >
> > On Fri, 21 Jul 2023, 18:16 Arthur O'Dwyer via Std-Proposals, <
> std-proposals_at_[hidden]> wrote:
> >>
> >> On Fri, Jul 21, 2023 at 12:59 PM Jerry Coffin via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >>>
> >>> There's a corner case for which the current specification of
> `std::string::append` will frequently lead to undefined behavior.
> >>>
> >>> Consider something like:
> >>>
> >>> ```cpp
> >>> std::strings = "A long enough string that two copies of it probably
> won't fit into the currently allocated storage";
> >>> s.append(s);
> >>> ```
> >>
> >>
> >> Even more natural:
> >> s += s;
> >> which is equivalent to
> >> s.append(s)
> >> which is equivalent to
> >> s.append(s.data(), s.size())
> >> which has the following effects ([string.append]/8):
> >> > Appends a copy of the range [s, s + n) to the string.
> >>
> >> However — it is not clear to me that this wording gives the
> implementation permission to append anything other than [s, s+n) to the
> string in the case that the appending operation itself causes [s, s+n) to
> become invalidated.
> >
> >
> > This is absolutely required to work. The string can detect that the
> argument aliases its current content and deal with it.
> >
> > It also has to work for s.insert(pos1, s, pos2, n) and s.replace(pos1,
> n1, s, pos2, n2) which is trickier, but has to work.
> >
> > The only time the implementation doesn't have to deal with this is if
> the string parameters is an rvalue reference, in which case it can assume
> no aliasing.
>
> Doesn't the proper handling of this case almost fall out from the
> exception safety requirements? You'll have to allocate the new buffer
> before invalidating
> the old one anyway in order to achieve a strong exception safety
> guarantee, and then it would be rather hostile to invalidate the old
> buffer before
> copying the incoming range to the new buffer, especially because you
> still need the old buffer to copy *that* to the new buffer too.. :D
>

You *could* allocate, copy the old string into the new storage, deallocate
old storage, then copy the argument into the new storage (which can't throw
because you're just copying trivial types). That wouldn't work, and is no
easier than just doing it so it works properly, so there's no reason to do
it.

As Andrey said, the reallocation case is easier to get right then to get
wrong, but the non-reallocating case needs care. And replacing part of the
string with itself here really tricky.


> So, the spec could perhaps use some clarity improvements,


The standard is not a tutorial ;-)

Like Arthur said, the spec says it does a copy of the range. There's no
allowance for "unless that range overlaps the string", it just has to do it.



but it seems
> to me that the correct handling of this case almost just.. ..falls out
> from
> the other requirements?
>

Received on 2023-07-21 20:02:40