Date: Mon, 2 Jun 2025 12:31:13 +0200
Hi Thomas,
On Sun, Jun 1, 2025 at 10:32 AM Thomas Krogh Lohse via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> Dear all,
>
> I’ve just submitted my master’s thesis in Software Engineering from
> Aalborg University (defending it on June 6), which focuses on memory safety
> in C++, and I’d like to briefly share the core idea of my project.
>
> The project defines a conservative safe subset of C++, and applies two
> static dataflow analyses:
> * A lifetime analysis to detect use-after-free, use-after-move, and
> similar issues.
> * A borrow checker-style analysis to ensure mutually exclusive access
> to resources.
>
> The safe subset is inspired by Rust and restricts some inherently unsafe
> constructs:
> * Pointer dereferencing
> * `new` / `delete`
> * `reinterpret_cast`, `const_cast`, and C-style casts
> * Union field access
> * Labels and `goto`
>
This sounds mostly like the same basic premise as the paper P3700 that I
wrote with the intent of seeing it in WG21 next June in Sofia. It proposes
a grid approach to looking at how to make C++ safer, because the whole
problem seems too large to be tractible by any one paper (and if such a
paper were produced with sufficient detail, no human would ever be able to,
or want to read it). It proposes to split the safety problem into rows (the
tools that we need to tackle safety), columns (the areas of safety that we
want to tackle) and cells (specific things we should do). Two rows that I
have in there are "removing features from the language that we know are
bad" and "adding annotations to enable separate static analysis". But
please do refer to the full paper at http://wg21.link/p3700 .
I've tried to work out the subsetting into P3716 (http://wg21.link/p3716)
with the intent of having a way to exclude specific constructs in a
standard and portable way.
The proposal you outlined needs subsetting of some sort, plus added
lifetime annotations, with an addition of specific checks to be implemented
within compilers that check lifetimes and access to resources.
The specific subsets you are proposing seem like an odd set. Disallowing
reinterpret_cast, const_cast and C-style casts seems eminently possible.
Removing goto and labels makes the language cleaner but isn't necessarily
directly related to safety. Removing union field access is restricting one
kind of UB. Removing new and delete forces people to use containers or
make_unique/make_shared, which is a clear win, with the downside of making
it impossible to implement make_shared, make_unique etc. themselves. Then
you also disallow pointer dereferencing, which seems like a huge impact on
the language, with impact beyond what I can see.
I'd love to see a paper working out in detail why we want to subset out
these specific bits.
> Instead, developers are encouraged (by the language) to use smart pointers
> for ownership and lvalue-references for borrowing, promoting RAII by
> default.
>
> The analyses are implemented in a proof-of-concept Clang plugin. Users can
> annotate types and functions with attributes (e.g., to define smart pointer
> behavior or skip analysis — similar to Rust's unsafe). It’s still a
> prototype and has some scalability and precision limitations, but it
> successfully enforces the subset and detects key violations. The
> implementation uses Andersen’s pointer analysis.
>
> Currently, the analysis does not handle polymorphism, exceptions, or
> lambdas, though I outline ideas for addressing these in future work.
>
> If refined with a more precise pointer analysis, some over-approximation
> fixes, and extended support for more of C++, I believe this approach could
> provide safety guarantees similar to Rust — but within standard, modern
> C++, without requiring a new frontend or language changes.
>
> I’d love to hear your thoughts:
> * Do you see value in defining a "safe-by-default" C++ subset with
> opt-in unsafe features?
> * Could something like this analysis model help enforce safety in
> future directions for the language?
>
I see a future for this direction but it needs more rationale, and more
details on how it works. In particular, we need to understand how pointer
and object lifetimes are passed along, without only tackling the easy side
of the problem.
The #1 result I think/hope we should get from this is a compiler's blessing
of much code, allowing developers to know that 90% of their code is safe
and checked, while allowing the remaining 10% to be either fixed or
reduced. It should at the very least make the 90%-C++ be recognized as a
safe language in the same way that Rust, Java etc. are, as at least that
much code is verifiably in no risk whatsoever of going out of bounds.
Excluding polymorphism, exceptions or lambdas makes this drop below 90% for
nearly all code bases.
On Sun, Jun 1, 2025 at 10:32 AM Thomas Krogh Lohse via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> Dear all,
>
> I’ve just submitted my master’s thesis in Software Engineering from
> Aalborg University (defending it on June 6), which focuses on memory safety
> in C++, and I’d like to briefly share the core idea of my project.
>
> The project defines a conservative safe subset of C++, and applies two
> static dataflow analyses:
> * A lifetime analysis to detect use-after-free, use-after-move, and
> similar issues.
> * A borrow checker-style analysis to ensure mutually exclusive access
> to resources.
>
> The safe subset is inspired by Rust and restricts some inherently unsafe
> constructs:
> * Pointer dereferencing
> * `new` / `delete`
> * `reinterpret_cast`, `const_cast`, and C-style casts
> * Union field access
> * Labels and `goto`
>
This sounds mostly like the same basic premise as the paper P3700 that I
wrote with the intent of seeing it in WG21 next June in Sofia. It proposes
a grid approach to looking at how to make C++ safer, because the whole
problem seems too large to be tractible by any one paper (and if such a
paper were produced with sufficient detail, no human would ever be able to,
or want to read it). It proposes to split the safety problem into rows (the
tools that we need to tackle safety), columns (the areas of safety that we
want to tackle) and cells (specific things we should do). Two rows that I
have in there are "removing features from the language that we know are
bad" and "adding annotations to enable separate static analysis". But
please do refer to the full paper at http://wg21.link/p3700 .
I've tried to work out the subsetting into P3716 (http://wg21.link/p3716)
with the intent of having a way to exclude specific constructs in a
standard and portable way.
The proposal you outlined needs subsetting of some sort, plus added
lifetime annotations, with an addition of specific checks to be implemented
within compilers that check lifetimes and access to resources.
The specific subsets you are proposing seem like an odd set. Disallowing
reinterpret_cast, const_cast and C-style casts seems eminently possible.
Removing goto and labels makes the language cleaner but isn't necessarily
directly related to safety. Removing union field access is restricting one
kind of UB. Removing new and delete forces people to use containers or
make_unique/make_shared, which is a clear win, with the downside of making
it impossible to implement make_shared, make_unique etc. themselves. Then
you also disallow pointer dereferencing, which seems like a huge impact on
the language, with impact beyond what I can see.
I'd love to see a paper working out in detail why we want to subset out
these specific bits.
> Instead, developers are encouraged (by the language) to use smart pointers
> for ownership and lvalue-references for borrowing, promoting RAII by
> default.
>
> The analyses are implemented in a proof-of-concept Clang plugin. Users can
> annotate types and functions with attributes (e.g., to define smart pointer
> behavior or skip analysis — similar to Rust's unsafe). It’s still a
> prototype and has some scalability and precision limitations, but it
> successfully enforces the subset and detects key violations. The
> implementation uses Andersen’s pointer analysis.
>
> Currently, the analysis does not handle polymorphism, exceptions, or
> lambdas, though I outline ideas for addressing these in future work.
>
> If refined with a more precise pointer analysis, some over-approximation
> fixes, and extended support for more of C++, I believe this approach could
> provide safety guarantees similar to Rust — but within standard, modern
> C++, without requiring a new frontend or language changes.
>
> I’d love to hear your thoughts:
> * Do you see value in defining a "safe-by-default" C++ subset with
> opt-in unsafe features?
> * Could something like this analysis model help enforce safety in
> future directions for the language?
>
I see a future for this direction but it needs more rationale, and more
details on how it works. In particular, we need to understand how pointer
and object lifetimes are passed along, without only tackling the easy side
of the problem.
The #1 result I think/hope we should get from this is a compiler's blessing
of much code, allowing developers to know that 90% of their code is safe
and checked, while allowing the remaining 10% to be either fixed or
reduced. It should at the very least make the 90%-C++ be recognized as a
safe language in the same way that Rust, Java etc. are, as at least that
much code is verifiably in no risk whatsoever of going out of bounds.
Excluding polymorphism, exceptions or lambdas makes this drop below 90% for
nearly all code bases.
Received on 2025-06-02 10:31:30