Date: Thu, 02 Apr 2026 09:12:45 -0700
On Thursday, 2 April 2026 02:42:01 Pacific Daylight Time Jan Schultke via Std-
Proposals wrote:
> > > Perform reference counting (and copy on write) for the dynamic
> > representation. This has key benefits:
> >
> >
> > wasn't this a flaw in GCC old `std::string`? When they move to C++11
> > string it improves performance in some cases?
>
> I'm not sure what you mean by "flaw" specifically here. Reference counting
> as a technique for std::big_int makes sense even in C++11. Move semantics
> allow you to also "steal" the handle to a reference counter from another
> std::big_int, so they interop nicely with reference counting.
>
> > Another thing, would this have the performance of `std::shared_ptr` as
> > it needs to be thread aware?
>
> Yes and no. Reference counting needs to take place atomically for the copy
> constructor to be thread-safe. However, as long as your integers are small,
> you never actually perform reference counting because of SOO.
I believe I can give some Qt experience here, as we still are big fans of
reference counting. We don't have a direct correlation: a variable-data-sized
type that is both reference counted and has small object optimisation. QString
and QList have variable data size, but not SOO; QVarLengthArray has SOO but is
not reference counted.
Const operations accessing the data are cheap and hardly any different than
non-refcounted types - because they don't look at or modify the reference
count in the first place.
The reference counted types are very good at move semantics. That's a simple
copy of the data and resetting the moved-from object to a particular state,
with no atomic operations required. Once we get relocation semantics, they
will also be well-adapted.
Copies are always O(1), but they incur an atomic operation. Most modern
architectures have optimised this, but it is not and can never be as cheap as
a simple increment. That is, instead of a cost of a quarter cycle, we're
looking at about a dozen CPU cycles. And unfortunately, this applies even when
the object is not shared with another CPU.
The problem is when you attempt to mutate the data. This requires inserting
the actual "copy on write" code and it tends to appear everywhere. Compilers
are not good yet at realising that once they have copied and hold an exclusive
ownership of an object, they won't need to copy again. I suppose that if the
Standard Library had more reference-counted types, compiler support in this
area would improve too.
Therefore, the question is one of trade-off: what is the break-even point for
deep-copying an object versus the atomic operation? There's also a question of
trend here: when we began using atomically-reference-counted types in Qt in
2005, most application processors were either dual core or, even better for
us, two threads of a single core. Today, consumer devices have a dozen cores
or two, with some higher end ones going upwards of 50 cores. *Applications*
haven't kept up with this and don't use that many threads yet, but the writing
is on the wall. And this is the Qt experience -- for the larger C++ world, we
need to take servers with hundreds if not thousands of cores and applications
that do have hundreds of threads.
Proposals wrote:
> > > Perform reference counting (and copy on write) for the dynamic
> > representation. This has key benefits:
> >
> >
> > wasn't this a flaw in GCC old `std::string`? When they move to C++11
> > string it improves performance in some cases?
>
> I'm not sure what you mean by "flaw" specifically here. Reference counting
> as a technique for std::big_int makes sense even in C++11. Move semantics
> allow you to also "steal" the handle to a reference counter from another
> std::big_int, so they interop nicely with reference counting.
>
> > Another thing, would this have the performance of `std::shared_ptr` as
> > it needs to be thread aware?
>
> Yes and no. Reference counting needs to take place atomically for the copy
> constructor to be thread-safe. However, as long as your integers are small,
> you never actually perform reference counting because of SOO.
I believe I can give some Qt experience here, as we still are big fans of
reference counting. We don't have a direct correlation: a variable-data-sized
type that is both reference counted and has small object optimisation. QString
and QList have variable data size, but not SOO; QVarLengthArray has SOO but is
not reference counted.
Const operations accessing the data are cheap and hardly any different than
non-refcounted types - because they don't look at or modify the reference
count in the first place.
The reference counted types are very good at move semantics. That's a simple
copy of the data and resetting the moved-from object to a particular state,
with no atomic operations required. Once we get relocation semantics, they
will also be well-adapted.
Copies are always O(1), but they incur an atomic operation. Most modern
architectures have optimised this, but it is not and can never be as cheap as
a simple increment. That is, instead of a cost of a quarter cycle, we're
looking at about a dozen CPU cycles. And unfortunately, this applies even when
the object is not shared with another CPU.
The problem is when you attempt to mutate the data. This requires inserting
the actual "copy on write" code and it tends to appear everywhere. Compilers
are not good yet at realising that once they have copied and hold an exclusive
ownership of an object, they won't need to copy again. I suppose that if the
Standard Library had more reference-counted types, compiler support in this
area would improve too.
Therefore, the question is one of trade-off: what is the break-even point for
deep-copying an object versus the atomic operation? There's also a question of
trend here: when we began using atomically-reference-counted types in Qt in
2005, most application processors were either dual core or, even better for
us, two threads of a single core. Today, consumer devices have a dozen cores
or two, with some higher end ones going upwards of 50 cores. *Applications*
haven't kept up with this and don't use that many threads yet, but the writing
is on the wall. And this is the Qt experience -- for the larger C++ world, we
need to take servers with hundreds if not thousands of cores and applications
that do have hundreds of threads.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Data Center - Platform & Sys. Eng.
Received on 2026-04-02 16:12:56
