ISOCPP std-proposals List: [std-proposals] Towards a library-based borrow checker (instead of safecpp.org)

From: Simon Schröder <dr.simon.schroeder_at_[hidden]>
Date: Sat, 4 Jan 2025 16:11:28 +0100

Hi,

Its my first time on this mailing list and I also don’t have any experience with the standardization process. Any help is welcome.

Because this email is a little longer, let’s start with a TL;DR:

TL;DR
I personally don’t see the point in fully replicating Rust (or any language) in C++. The switch proposed at safecpp.org is quite dramatic. I’ll elaborate a little below. Based on compile time evaluation and (maybe) reflection, I have some ideas how to extend the C++ language that might allow a library-based borrow checker (opt-in, zero-overhead principle, etc.). I see 3 things to discuss/add to the language that could bring us on a way towards this (at the same time these proposed features would be universal and allow more than just a borrow checker). 1) We need to know the last use of a variable, i.e. not just the lifetime until the end of the scope. 2) constant evaluated member variables to track the borrow state for an object. 3) Syntax or rules how to have compile time and runtime evaluation of a statement at the "same time"/for the same line of code (we need to track the borrow state based on assignments).
~TL;DR (You’ll find the numbers 1)-3) down below with more explanation.)

0) Motivation
As mentioned in the TL;DR I am not too happy with the safecpp.org proposal. If people really want to write Rust, they should just do that. Currently, safecpp proposes safe code sections that use a separate STL making it somewhat incompatible with existing C++ code. It is basically a new language within the language, just hidden by the fact that it looks the same. I personally think it would be better to have a different language syntax (i.e. Rust) altogether to make it clear there are different rules. You can just use Rust, e.g. with the cxx crate (https://cxx.rs) to access existing C++ code. There is even already a cxx bridge to Qt: https://kdab.github.io/cxx-qt/book/index.html.

Sometimes it just needs the right ideas to not change the language entirely, but have a library feature instead. I hope to provide some new ideas that could help in this context. Herb Sutter has written down some guidelines for language evolution (https://isocpp.org/files/papers/P3466R1.pdf). He explicitly states that "we should not bifurcate the standard library" (3.2 “No gratuitous incompatibilities with C” [or previous C++], page 3). My understanding is that safecpp would like to bifurcate the standard library. I do understand that it is much harder to make the existing STL compatible with a borrow checker. But, it would be really nice to have a borrow checked std::vector which is ABI compatible with the existing std::vector so it can be handed as argument to some legacy functions. He also states "4.7 Prefer consteval libraries when they can be of equivalent usability, expressiveness, and performance as baked-in language features". I think this is possible for a borrow checker with some other extensions of the language. Unfortunately, a library-based approach would always be opt-in instead of opt-out (see my PS at the bottom for how profiles might change that). This makes it impossible to follow "4.1 Make features safe by default, with full performance and control always available via opt-out". Given that the borrow checker is a safety feature it is not possible to make it the default when implementing it as a library. In a way a library-based borrow checker would make the use of the borrow checker explicit which would make it a "heavy annotation" (4.5 Adoptability: Avoid heavy annotation). Especially with ABI compatibility of a borrowed vector with a plain vector would not make it necessarily viral (4.4 Adoptability: Avoid viral annotation). Adoption could be gradual, but to make your whole source safe (but maybe not 3rd-party libraries) is in a way viral. Only a consistent use of this feature avoids mistakes concerning safety (e.g. a call to a function that does not (yet) use the borrow checker in its parameters would need manual borrow management on the caller’s site).

I hope this is not seen too much of a rant against safecpp.org.

In short, a library based solution could look something like this:
std::checked<int> i = 42; // declare variable for which we want borrow checking
std::borrow<int&> ref = i; // create a borrow for ‘i’
Because of automatic template type deduction we could also just write
std::borrow ref = i;
Basically, std::borrow would just replace our use of ‘auto’ (for those who like to use it) everywhere. We might want to look into different words than ‘checked’ and ‘borrow’. These are just (somewhat) short names that I could come up with.

This is not a proposal for a library-based borrow checker. Instead, here we have 3 possible proposals which would allow (hopefully) to implement a library-based borrow checker. A borrow checker proposal would be for a later time once we got the other features.

1) Variable lifetimes/last use
Mojo (a new language trying to be compatible with Python syntax) decided to destroy objects after their last use (i.e. last use of the variable) and not at the end of the scope, as C++ does it (https://docs.modular.com/mojo/manual/lifecycle/death). They claim, this gives superior performance. This alone should let us investigate this further. It is a requirement for a borrow checker to know when a variable is last used: The last use of a variable (if it is a borrow) ends the borrow. For a library-based borrow checker we would need to be able to know at compile-time when this happens. A trivial solution would be to just change object lifetimes from end of scope to last use. However, this is the wrong solution because it would break existing code like this (really short Qt example):
int main(int argc, char *argv[])
{
    QApplication app(argc,argv);
    QWidget w;
    w.show();
    return app.exec();
}
The last use of ‘w’ is at ‘w.show()’. But the QWidget needs to live until the app is quit (i.e. until the end of the scope). Similar problems arise if we call a function that keeps a reference to one of its arguments and a different function call uses this (most likely these functions would be member functions). The last use of the variable (locally visible to the compiler) could be the first function call. So, we need to stick to scope-based lifetimes of objects to not break existing code.

Let’s look at other languages and derive some ideas.
a) Zig uses ‘defer’ for a scope-based clean-up (instead of a destructor like C++). We could also introduce a keyword like ‘defer’. There could be an ‘eager defer’ (executed at the last use of a variable) and a ‘lazy defer’ (executed at the end of the scope). I’d prefer the eager defer by default (better for performance according to Mojo). So, we could introduce keywords ‘defer’ and ‘lazy_defer’/‘defer_lazy’. Currently, I know this feature as scope guard under C++: a class that takes a function as argument in its constructor and executes it in its destructor. At least the lazy variant could already be implemented as library instead of using a keyword (to be used like this: auto _ = defer([]() { std::println("end of scope"); });). If we go for one of the other solutions, the eager variant could be implemented as library as well. A slight variation of this could be that we introduce an attribute, like [[eagerly_destroy]] (definitely needs a shorter name), to annotate variables that should be destroyed at their last use instead of the end of the scope. This attribute could also go to a class declaration to make each instance of that class to be eagerly destroyed (by default?). This would allow to annotate std::borrow with [[eagerly_destroy]] and have the borrow always end its lifetime at their last use instead of end of scope.
b) A little less elegant would be the solution from Rust using ‘drop’. ‘drop’ explicitly calls the destructor of an object at a specific line of source code. It is less elegant because we would have to be explicit when the lifetime of a borrow should end. I personally don’t like this solution. (I always thought ‘drop’ would be a keyword, but it is a trait in Rust (https://doc.rust-lang.org/rust-by-example/trait/drop.html). So, we could actually do something similar in C++ right now. However, we would need to encapsulate variables inside a std::droppable<T> such that the destructor of the contained object is only called once (either through calling drop() or through the destructor of the std::droppable.))
c) I have another idea not inspired by other languages. We could have two kind of destructors: One called when the object goes out of scope (just like now) and a second one which is called when the variable is last used. My initial idea is to write the new destructor as ~~MyClass() instead as just ~MyClass(). (There is probably a better idea than this.) I guess, there can be other use cases for a set of two separate destructors. I believe this feature would be nice to have independent of a library-based borrow checker. Reason: Performance (allegedly)!

2) consteval member variables
This feature would extend compile time programming. My idea is that besides regular (runtime) member variables we could add variables to a class that are accessible during compile-time only. As such, they would not increase the size of a class at runtime; at runtime these are just not available and thus zero-overhead (both in memory space and execution time). These member variables would only be accessible in constant evaluated contexts (I guess this is what it is called in C++). Since we mark constant evaluated functions with the keyword ‘consteval’ I propose to also use this keyword for these member variables in order to not introduce a new keyword to the language (even though it sound nonsensical that a variable would be constant evaluated, but it would rather mean that the variable is only available in a consteval context). Again, I assume that this feature could be used for many other things and not just a library-based borrow checker.

Why would we need this for the borrow checker? From my cursory reading (https://rust-book.cs.brown.edu/ch04-02-references-and-borrowing.html) of how borrow checking in Rust works, it keeps track of how many references have read, write, and ownership permissions. These could be consteval member variables of a class std::checked<T> which count up/down with each borrow: assigning to a borrow counts up and when a borrow variable is last used (see 1)) it counts down. It is basically like a smart (shared) pointer at compile time. We can use static_assert to throw an error when the borrow check fails.

3) Hybrid compile-time and runtime evaluation
To make a library-based borrow checker practical we would need one more feature. In the beginning I mentioned to introduce two new types: std::checked<T> for variables that can be borrowed from and std::borrow<T> for the borrow itself. When assigning a checked<T> to a borrow<T> we would like the compiler to automatically call a compile time function (so we can update the consteval member variables), but also create instructions that execute at runtime. In order for a general C++ feature–again independent of a library-based borrow checker–I would like a general mechanism to allow for any function to be executed both at runtime and compile time.

Again, I have a few different solutions to this problem:
a) A function can have a consteval and regular version. If a function has a consteval version at the same time as a runtime version, it is executed at compile time and at the same time the code of the regular runtime function is also produced. This could look something like this:

T& std::borrow<T>::operator=(std::checked<T> &other) { this->data = other.data; } // simplified regular assignment operator
consteval T& std::borrow<T>::operator=(std::checked<T> &other) { /* manipulate the consteval members of std::checked<T> */ }

Whenever operator= occurs in the source code we generate the code for the runtime operator (just as we do now), but at the same time we check if there is a consteval version as well and also execute that code at compile time. (It just occurred to me that we rather need this for the constructor of std::borrow<T> because it behaves like a reference (and thus cannot be reassigned to a different std::checked<T>).) This is most likely the easiest way for a user of this feature, but I am not fully convinced that this is a good solution.
b) We could use a slight variation of solution a): we annotate the runtime version of the function with a consteval function to be called. This makes it more explicit that both a compile time function and a runtime function is called. The implicit nature of solution a) is what I don’t really like about that solution. The consteval function could have the same name, but also a totally different name (specified by the annotation). The annotation could be an attribute [[consteval=functionname]] or an extension of contracts (right besides pre- and postconditions). For the latter case it is just some form of precondition to first call the compile time function before generating the code for the runtime function (or maybe even one consteval function before and one after). Thinking of reflection the consteval function could just take a std::meta::info as argument containing the full assignment expression. This would yield the most information.
c) A last solution could start from the consteval function using reflection. There would NOT be a consteval as well as a runtime overload. Instead there would ONLY be the consteval function. Within the function we could do whatever compile time evaluation we want to do. At the end of the function we would inject the actual assignment function into the token queue using reflection capabilities. So, basically the runtime function would be contained within the consteval function as reflection code.

(Thinking further about this, I am not entirely sure where consteval function calls are currently allowed. We could certainly be in reflection territory.)

4) Summary
I think with just these three additions to the language (which would have much wider applicability) we could move towards a library-based borrow checker. Use of the borrow checker would be explicit for each variable. This allows to choose for each variable to either use or not use the borrow checker. This might be especially helpful to gradually move code bases towards the use of a borrow checker because it doesn’t have to be used everywhere at once. It would be best if we could keep ABI compatibility for std::vector or std::string when these are wrapped into a std::checked<T>.

Certainly, the timeline for a library-based borrow checker would be a lot longer than just implementing a borrow checker right inside the compiler. Maybe, we would get the three features mentioned above into C++29. Only then we would start experimenting with the implementation of a borrow checker library targeting a later standard. Instead of the Rust approach we could experiment with variations of the borrow checker. Other libraries could provide alternative solutions. Independent of that we would get three nice features into C++ that hopefully inspire a lot of new useful ideas (just as templates did for template meta programming).

To sum it up: I envision a type like std::checked<T> which introduces variables for which to do borrow checking. A helper type std::borrow<T> would help with keeping track of borrows. In order to interface with legacy functions we could introduce member functions borrow() and unborrow() to std::checked<T> to manually control borrows. I think that three new language features would allow the implementation of a borrow checker:
1) A system to manage variable lifetime, i.e. the last use of a variable in code instead of scope-based lifetimes. We could a) use early (last use) and lazy (scope) defer or even an annotation (attribute) to signify a variable to be destructed after its last use. Other solutions include b) a drop() function to explicitly end the lifetime of an object before the scope ends or c) a second destructor ~~MyClass() which is called when the variable is last used. (I kind of like solution c)–maybe with a different syntax.)
2) consteval member variables. These member variables would only be accessible at compile time and thus do not add to the runtime size of an object. constexpr is already taken and has a different meaning as a variable specifier. consteval member variables could only be accessed from consteval function (or from constant evaluated contexts within a constexpr function). In order to avoid a new keyword we would declare variables as consteval. (Just to be clear: consteval member variables would not be constants but would be allowed to mutate.)
3) We need syntax for consteval functions (changing the consteval member variables to reflect borrowing) to be automatically called when assigning a std::checked<T> to a std::borrow<T> (usually in the constructor of std::borrow<T>). I don’t think I have the perfect solution for this, yet. Solution a) proposes to always call the consteval function as well as generate the runtime function call when a consteval and regular runtime function with the (otherwise) same signature exist. Alternatively, b) we could provide an attribute [[consteval=functionname]] to our runtime function (or an extension of contracts) which triggers a compile time function call. This consteval function could take a std::meta::info with the reflected expression of the original function call. Or c) we could call a consteval function which injects the tokens for the runtime function call via reflection.

BTW: Since std::checked<T> and std::borrow<T> behave like references, it would be really nice to have operator.() in the language as well (there’s already a proposal for that: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4477.pdf).

On a side note: With yet another library class we could track the status of a variable using 2) consteval member variables and 3) hybrid compile time/runtime evaluation. We could track if a variable has been initialized before it is first read; similar to how this is tracked in Herb’s cppfront.

Sorry for the very long email. I tried to keep each point short. Naturally, this will raise further question, but I hope to get the initial ideas for the three language features across. I haven’t followed each and every proposal (I usually just read up on trip reports and read the proposals mentioned there), so let me know if anything similar has already been proposed.

I don’t have any experience with the standardization process at all. So, I need some support writing proposals if there is interest in further pursuing any of these. Any help in navigating standardization is welcome.

PS: I said that this is not going to be a borrow checker proposal in itself. However, I have a bunch of ideas related to this which–with the right choices for the features above–could make a library-based borrow checker close to "safe by default" with an "opt-out".

In the long run it could make sense to call a variable that both uses the borrow checker and tracks initialization status std::var<T>. By ‘using std::var;’ and with template argument deduction we could replace every use of ‘auto’ with ‘var’ (maybe introduce a safety profile that just redefines ‘auto’ as ’std::var’ so everything is safe by default). Through metaclasses we could make it easy to introduce borrow checking into std::vector, std::string and other classes. These classes could use a template parameter to opt-out of the borrow checker. This gets us closer to "safety by default". I don’t see how we can shorten std::borrow<T>, but we could introduce a conversion operator std::checked<T>::operator T&() which would at least count up the borrow (but don’t count down at the last use of the reference). Depending on the solution for hybrid compile-time/runtime evaluation we could replace declarations of T& with a declaration of std::borrow<T> (this std::borrow<T> better have operator.() for code compatibility). It would be nice to complement std::var<T> with a std::ref<T> (but that name is already taken). Real constants that don’t allow to cast away constness could be called std::val<T>. std::val<T> does not need a borrow checker (because it can never be changed, so there are only read-only references). I shortly thought it would need hybrid compile-time/runtime evaluation. However, by just not implementing an assignment operator we could have a std::val<T> right now.

Wouldn’t it be nice if in the future we write ‘var’ instead of ‘auto’, ‘ref’ instead of ‘auto&’ (though this is already taken), and ‘val’ instead of ‘const auto’? (‘let’ instead of ‘val’ would also have a nice ring to it, but would look weird if someone provides the type explicitly.) All this would be library based and thus a lot easier to change. A specific profile could make all those replacements automatically. If the profile does not exactly replace to std::var/std::ref/std::val but var/ref/val instead, someone can provide their own implementation of var/ref/val and pull them in using ‘using’. (Actually, we’d need to differentiate between constant and mutable borrows. So, instead of std::ref we’d have std::mut/std::ref (stealing the name ‘mut’ from Rust) for ‘auto&’ and std::roref/std::cref (read-only ref) for ‘const auto&’.)

PPS: Let me quickly introduce myself for those who are interested. I learned programming in the summer of 1999 using QBasic. Just a few month later I had my first C++ book. Since then I have learned some other languages for fun, but have only ever really used C++. My PhD is in computer graphics (scientific visualization), but my jobs so far have all been in scientific computing (I’ve also touched some Fortran code in these projects). Besides standard C++ I have quite a lot of experience with the Qt framework (I’m waiting for "modern Qt" just like we now have "modern C++"). At work, I’m currently stuck with C++17, but I’m always curious what the future of C++ holds.

Received on 2025-01-04 15:11:34