C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Forbid optimisations on std::basic_string implementations.

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Mon, 18 Aug 2025 20:17:01 +0000
I think the only propositions for new types of string that have any changes are "fixed sized" strings and strings without null termination.

I don't find micro-optimizations of string implementation to be all that useful, because:
1. I would expect the implementation of std::string to be efficient, or at least well balanced. If it isn't, it's a matter of petitioning the library implementer to adopt a better one without the need for it to be mandated by the standard.
2. Propositions like these have happened many times before and failed because of 1. I don't expect this to be any different.
3. It's premature pessimization. I've done an analysis of this on some sizeable commercial application and the highest cost item (of the entire application) is std::string constructor. Often associated with string copy and manipulation that requires the heap to be touched. And this is not an isolated incident either as I have seen the same thing on many applications across different companies.

The problem stems from the fact that strings are way too easy to use, so people tended to write interfaces (for things that might need a string for) with std::string. Even if what you are passing is a const reference, you need to pay the price since it requires a const reference to a std::string so one must be created somewhere, even if it was totally unnecessary. string_view is an underappreciated technical achievement that helped to mitigate much of these problems.

Hence the problem with the variations on the std::string. It's a different string, meaning the more interfaces you have written with it, the more conversions need to take place to use it.
Sure, it might be more "efficient" in some usage scenarios, but all those benefits will quickly be killed by the fact that you need to create them in the first place, and the more variety the worse the problem.

No thanks, if you need to dynamically allocate your string you ain't building a "super race horse" of an application, your micro-optimization probably won't be significant.


________________________________
From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of Jason McKesson via Std-Proposals <std-proposals_at_[hidden]>
Sent: Monday, August 18, 2025 5:29:03 PM
To: std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Cc: Jason McKesson <jmckesson_at_[hidden]>
Subject: Re: [std-proposals] Forbid optimisations on std::basic_string implementations.

On Mon, Aug 18, 2025 at 6:33 AM Ben Crowhurst via Std-Proposals
<std-proposals_at_[hidden]socpp.org> wrote:
>
>
> There is a history of STL implementations introducing string optimisations, for example, copy-on-write in GCC (<=4), adoption of small string optimisation (SSO) - with varying details across GCC, Clang, and MSVC.
>
> Focusing on SSO, using std::string, along with std::string_view and move semantics, introduces a frustrating folly.
>
> The name std::basic_string implies a basic implementation, and many assume it is nothing more than a wrapper around a C-style character pointer with bounds checking.

I question this statement. Who exactly is it that thinks this? Is this
really a common misconception, and if it is, who is having this
misconception?

Novice C++ users shouldn't be thinking about whether `basic_string` is
a "wrapper around a C-style character pointer"; they'll be thinking of
it as a string. Especially if they come from a language that's a lot
more strict about treating strings as a first-class object.

It should also be noted that most code that needs to rely on the
behavior of a pointer into a moved-from `basic_string` is already in
dubious territory.

Consider the code in question. You have code X which has a
`string_view` or pointer into a string. Code Y has a `std::string`
that actually holds that string. And code Y decides to move it into
another `string` object.

If code Y could move the string... couldn't it also have *destroyed*
the string? That is kind of what it did when it moved from it.

Code X needs the string object it points into to continue to exist.
That's called "ownership". Functionally, this is equivalent to code Y
having a `unique_ptr<T>` and code X having a `T*` that comes from that
unique pointer. Yes, it wouldn't fail just by moving from the unique
pointer, but you still shouldn't write this code at all. Why? Because
code X has an implicit ownership relationship, not one explicitly
spelled out by the code.

That's a code smell. Storing `string_view`s in a way such that it's
even *possible* for the `string` they point into to be moved from (or
destroyed) is the problem with the code. If code X is going to store
the string *at all*, it should be a `std::string`, not a
`string_view`.
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]rg
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-08-18 20:17:05