std-proposals: Re: std::take(obj), aka std::exchange(obj, {})

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Thu, 24 Sep 2020 23:01:44 -0400

On Thu, Sep 24, 2020 at 7:56 PM Giuseppe D'Angelo <giuseppe.dangelo_at_[hidden]>
wrote:

> Il 25/09/20 00:15, Arthur O'Dwyer ha scritto:
> > The big costs of `std::exchange(obj, {})` as far as I'm concerned are
> > that you have to default-construct a whole T object, pass std::exchange
> > a /reference/ to that object, call the assignment operator which has to
> > conditionally free the old resource, and finally destroy the temporary T
> > object (which has to conditionally free the resource). I can see how it
> > would be beneficial to coalesce some of these operations into a tighter
> > package like std::take. But I don't see how your suggested
> > implementation of std::take actually helps. You're still constructing a
> > temporary T, calling the assignment operator, and calling the destructor
> > of the temporary.
> >
> > // std::take of a string
> > {
> > std::string temp{};
> > newobj = std::move(obj);
> > obj = std::move(temp);
> > } // temp.~string();
> >
> > // what you want instead
> > newobj = std::move(obj);
> > obj.clear();
>
>
> I don't think these two are equivalent.
>

They are for std::string, std::vector<T>, all other containers... You're
right that the latter is not a generic solution (e.g. that code won't
compile for smart pointers like std::unique_ptr<T>).
That's exactly why I claim this is a problem for your current proposal. You
propose that std::take should have the same semantic effect as "move-out
and then clear," but the physical implementation strategy you propose for
it intentionally produces suboptimal codegen for many types. You propose a
new library function to do a thing *less efficiently* than the programmer
could do it by hand. I think you should look for an implementation
strategy that lets you produce the best possible codegen.
The cheap way out would be to make `std::take` a customization point; but I
think C++ already suffers from a vast overpopulation of customization
points.

With something like
> new_string = std::move(old_string);
> old_string.clear();
> then old_string's state is not guaranteed to be in the std::string's
> default constructed state. In all likelihood, old_string will have
> leftover capacity:

Correct.

> namely, *new_string*'s capacity before the move
> assignment (assuming that it's done via pure swap, not via move and
> swap).

Huh, that's interesting: libstdc++ does indeed swap the heap-allocations of
x and y when you do `x = std::move(y)`, basically as if you'd done
`swap(x,y); y.resize(0);`.
libc++ does what I had initially expected, and just deallocates x's old
allocation.

I also observe that when you do `x = std::string{};` on libstdc++, x keeps
its old capacity! Whereas on libc++, that line resets x's capacity as I'd
have expected.
https://godbolt.org/z/WYx1va

However, I think this is all of questionable relevance, because capacity is
not a *salient* feature of std::string. It's not part of the string's
value. It's not preserved by copy or move (as we've just seen), and it's
not part of the state compared by operator==. (John Lakos has a talk from a
few years ago on how to think about *salient* state.) State can be
observable without being salient — for example, std::addressof(s[0]) is
observable, but not salient. s.capacity() is similarly observable but not
salient.

> Also, in all likelihood, old_string's size is 0 right after the
> move, so clear() here is a no op.
>

Correct; see
https://stackoverflow.com/questions/52696413/unnecessary-emptying-of-moved-from-stdstring
.
All vendors do the small-buffer optimization, and all vendors will empty
out the source string even when it's small and doesn't need to be emptied.
On the other hand, std::function also does the small-buffer optimization,
and there libstdc++ empties out the source std::function but *libc++ does
not:* https://godbolt.org/z/ja7nf7

> > Here's the difference in codegen: https://godbolt.org/z/n54arb
> > So the $64,000 question is: How do you get the good codegen?
>
> I can't make apples and oranges' codegen be equal...
>

Okay, how about for std::list<int>, then? Is that apples versus apples?
https://godbolt.org/z/Th5roG

And if it's used for move construction rather than for move assignment,
> then it's actually just like the proposed move (+clear()):
> https://godbolt.org/z/nxrvo8

Hm, that's interesting. I wonder why move-assigning the result of `take`
should cause a visible suboptimality, when constructing directly from it
doesn't.

> If you can't get the good codegen, then I don't see the point of making
> > a wrapper around std::exchange(obj, {}). It's already short and clear
> > enough — certainly clearer than std::take(obj), /this year/, since
> > people have already had 20 years to learn what std::exchange means.
>
> Sorry, I don't buy the "20 years" claim. Surely the idea is at least
> that old and exchange-like asm instructions have existed for a very long
> time.
>
> But the usable tool for the C++ programmer only came in C++14 (and in
> Boost / ABSL even later than that!). Hence I'm not really surprised that
> it's somehow "rarely" used, not regularly taught, not fully discovered,
> etc.
>

Ah, you're right, std::*swap* was C++98 but std::exchange was C++14. I
didn't realize/remember that std::exchange was that new.

> > You'll have to teach them what std::take means, and (more importantly)
> > why to use it. If you can't explain why to use it, then it's not a
> > good idea. "It produces better codegen" would be a great explanation...
> > but how do we get from here to there?
>
> I am addressing some of the "why to use it" in the proposal itself. Also
> if SG20 (a target audience) has inputs I'll be glad to discuss them.
>

FWIW, I'm moderately involved with SG20. :)

–Arthur

Received on 2020-09-24 22:01:58