Date: Mon, 18 Aug 2025 11:28:44 -0400
On Mon, Aug 18, 2025 at 6:33 AM Ben Crowhurst via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
>
> There is a history of STL implementations introducing string optimisations, for example, copy-on-write in GCC (<=4), adoption of small string optimisation (SSO) - with varying details across GCC, Clang, and MSVC.
>
> Focusing on SSO, using std::string, along with std::string_view and move semantics, introduces a frustrating folly.
>
> The name std::basic_string implies a basic implementation, and many assume it is nothing more than a wrapper around a C-style character pointer with bounds checking.
I question this statement. Who exactly is it that thinks this? Is this
really a common misconception, and if it is, who is having this
misconception?
Novice C++ users shouldn't be thinking about whether `basic_string` is
a "wrapper around a C-style character pointer"; they'll be thinking of
it as a string. Especially if they come from a language that's a lot
more strict about treating strings as a first-class object.
It should also be noted that most code that needs to rely on the
behavior of a pointer into a moved-from `basic_string` is already in
dubious territory.
Consider the code in question. You have code X which has a
`string_view` or pointer into a string. Code Y has a `std::string`
that actually holds that string. And code Y decides to move it into
another `string` object.
If code Y could move the string... couldn't it also have *destroyed*
the string? That is kind of what it did when it moved from it.
Code X needs the string object it points into to continue to exist.
That's called "ownership". Functionally, this is equivalent to code Y
having a `unique_ptr<T>` and code X having a `T*` that comes from that
unique pointer. Yes, it wouldn't fail just by moving from the unique
pointer, but you still shouldn't write this code at all. Why? Because
code X has an implicit ownership relationship, not one explicitly
spelled out by the code.
That's a code smell. Storing `string_view`s in a way such that it's
even *possible* for the `string` they point into to be moved from (or
destroyed) is the problem with the code. If code X is going to store
the string *at all*, it should be a `std::string`, not a
`string_view`.
<std-proposals_at_[hidden]> wrote:
>
>
> There is a history of STL implementations introducing string optimisations, for example, copy-on-write in GCC (<=4), adoption of small string optimisation (SSO) - with varying details across GCC, Clang, and MSVC.
>
> Focusing on SSO, using std::string, along with std::string_view and move semantics, introduces a frustrating folly.
>
> The name std::basic_string implies a basic implementation, and many assume it is nothing more than a wrapper around a C-style character pointer with bounds checking.
I question this statement. Who exactly is it that thinks this? Is this
really a common misconception, and if it is, who is having this
misconception?
Novice C++ users shouldn't be thinking about whether `basic_string` is
a "wrapper around a C-style character pointer"; they'll be thinking of
it as a string. Especially if they come from a language that's a lot
more strict about treating strings as a first-class object.
It should also be noted that most code that needs to rely on the
behavior of a pointer into a moved-from `basic_string` is already in
dubious territory.
Consider the code in question. You have code X which has a
`string_view` or pointer into a string. Code Y has a `std::string`
that actually holds that string. And code Y decides to move it into
another `string` object.
If code Y could move the string... couldn't it also have *destroyed*
the string? That is kind of what it did when it moved from it.
Code X needs the string object it points into to continue to exist.
That's called "ownership". Functionally, this is equivalent to code Y
having a `unique_ptr<T>` and code X having a `T*` that comes from that
unique pointer. Yes, it wouldn't fail just by moving from the unique
pointer, but you still shouldn't write this code at all. Why? Because
code X has an implicit ownership relationship, not one explicitly
spelled out by the code.
That's a code smell. Storing `string_view`s in a way such that it's
even *possible* for the `string` they point into to be moved from (or
destroyed) is the problem with the code. If code X is going to store
the string *at all*, it should be a `std::string`, not a
`string_view`.
Received on 2025-08-18 15:28:56