ISOCPP std-proposals List: Re: [std-proposals] Towards a library-based borrow checker (instead of safecpp.org)

From: Jarrad Waterloo <descender76_at_[hidden]>
Date: Sat, 11 Jan 2025 10:21:48 -0500

The borrow checker is a composition of several features and is still
growing and changing.
[So which version of Rust's borrow checker.]

This article talks about its current problems and the directions it is
going in.
https://smallcultfollowing.com/babysteps/blog/2024/06/02/the-borrow-checker-within/

The fact is borrow checking is composed of multiple pieces. Each of which
has safety merits in unsafe code apart from borrow checking.
https://rustc-dev-guide.rust-lang.org/borrow_check.html#mir-borrow-check
https://rustc-dev-guide.rust-lang.org/borrow_check.html#major-phases-of-the-borrow-checker

A Lot of these pieces have already been proposed for C++ and [partially]
rejected or not prioritized.

1) range and data flow analysis are important pieces of Herb Sutter's
safety focused on invalidation detection.
2) last use analysis is important for turning current lvalues into
rvalue moves i.e. reference minimizing
3) implicit constant initialization i.e. guaranteed static initialization
is important for turning many rvalues back into safe lvalues instances
4) implicit/deduced attributes/annotations and reference checking is
capable of identifying most of the dangling of the stack

None of these require anything to be done by the end programmer, rather it
is demanding that the compiler uses what it already knows.

There are also mistakes in Rust and in C++ which need to be fixed such as
temporaries.
For Rust it would be a language breaking change. Due to C++'s complexity
and simplicity via abstraction, its breaking change could be greatly
minimized by fixing the original problems with properly designed language
features instead of hacks.

Further. safer C++ borrow checking libraries exist just that they haven't
been proposed and adopted by C++.
https://verdagon.dev/blog/vale-memory-safe-cpp "Making C++ Memory-Safe
Without Borrow Checking, Reference Counting, or Tracing Garbage Collection"
cowned is usable and just missing the DEBUG code to ensure the overhead
doesn't exist at runtime and source_location for diagnoses.
vowned is just missing the decrements in the destructor and the DEBUG code
to ensure the overhead doesn't exist at runtime and source_location for
diagnoses

On Sat, Jan 4, 2025 at 10:11 AM Simon Schröder via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> Hi,
>
> Its my first time on this mailing list and I also don’t have any
> experience with the standardization process. Any help is welcome.
>
> Because this email is a little longer, let’s start with a TL;DR:
>
> TL;DR
> I personally don’t see the point in fully replicating Rust (or any
> language) in C++. The switch proposed at safecpp.org is quite dramatic.
> I’ll elaborate a little below. Based on compile time evaluation and (maybe)
> reflection, I have some ideas how to extend the C++ language that might
> allow a library-based borrow checker (opt-in, zero-overhead principle,
> etc.). I see 3 things to discuss/add to the language that could bring us on
> a way towards this (at the same time these proposed features would be
> universal and allow more than just a borrow checker). 1) We need to know
> the last use of a variable, i.e. not just the lifetime until the end of the
> scope. 2) constant evaluated member variables to track the borrow state for
> an object. 3) Syntax or rules how to have compile time and runtime
> evaluation of a statement at the "same time"/for the same line of code (we
> need to track the borrow state based on assignments).
> ~TL;DR (You’ll find the numbers 1)-3) down below with more explanation.)
>
> 0) Motivation
> As mentioned in the TL;DR I am not too happy with the safecpp.org
> proposal. If people really want to write Rust, they should just do that.
> Currently, safecpp proposes safe code sections that use a separate STL
> making it somewhat incompatible with existing C++ code. It is basically a
> new language within the language, just hidden by the fact that it looks the
> same. I personally think it would be better to have a different language
> syntax (i.e. Rust) altogether to make it clear there are different rules.
> You can just use Rust, e.g. with the cxx crate (https://cxx.rs) to access
> existing C++ code. There is even already a cxx bridge to Qt:
> https://kdab.github.io/cxx-qt/book/index.html.
>
> Sometimes it just needs the right ideas to not change the language
> entirely, but have a library feature instead. I hope to provide some new
> ideas that could help in this context. Herb Sutter has written down some
> guidelines for language evolution (
> https://isocpp.org/files/papers/P3466R1.pdf). He explicitly states that
> "we should not bifurcate the standard library" (3.2 “No gratuitous
> incompatibilities with C” [or previous C++], page 3). My understanding is
> that safecpp would like to bifurcate the standard library. I do understand
> that it is much harder to make the existing STL compatible with a borrow
> checker. But, it would be really nice to have a borrow checked std::vector
> which is ABI compatible with the existing std::vector so it can be handed
> as argument to some legacy functions. He also states "4.7 Prefer consteval
> libraries when they can be of equivalent usability, expressiveness, and
> performance as baked-in language features". I think this is possible for a
> borrow checker with some other extensions of the language. Unfortunately, a
> library-based approach would always be opt-in instead of opt-out (see my PS
> at the bottom for how profiles might change that). This makes it impossible
> to follow "4.1 Make features safe by default, with full performance and
> control always available via opt-out". Given that the borrow checker is a
> safety feature it is not possible to make it the default when implementing
> it as a library. In a way a library-based borrow checker would make the use
> of the borrow checker explicit which would make it a "heavy annotation"
> (4.5 Adoptability: Avoid heavy annotation). Especially with ABI
> compatibility of a borrowed vector with a plain vector would not make it
> necessarily viral (4.4 Adoptability: Avoid viral annotation). Adoption
> could be gradual, but to make your whole source safe (but maybe not
> 3rd-party libraries) is in a way viral. Only a consistent use of this
> feature avoids mistakes concerning safety (e.g. a call to a function that
> does not (yet) use the borrow checker in its parameters would need manual
> borrow management on the caller’s site).
>
> I hope this is not seen too much of a rant against safecpp.org.
>
> In short, a library based solution could look something like this:
> std::checked<int> i = 42; // declare variable for which we want borrow
> checking
> std::borrow<int&> ref = i; // create a borrow for ‘i’
> Because of automatic template type deduction we could also just write
> std::borrow ref = i;
> Basically, std::borrow would just replace our use of ‘auto’ (for those who
> like to use it) everywhere. We might want to look into different words than
> ‘checked’ and ‘borrow’. These are just (somewhat) short names that I could
> come up with.
>
> This is not a proposal for a library-based borrow checker. Instead, here
> we have 3 possible proposals which would allow (hopefully) to implement a
> library-based borrow checker. A borrow checker proposal would be for a
> later time once we got the other features.
>
> 1) Variable lifetimes/last use
> Mojo (a new language trying to be compatible with Python syntax) decided
> to destroy objects after their last use (i.e. last use of the variable) and
> not at the end of the scope, as C++ does it (
> https://docs.modular.com/mojo/manual/lifecycle/death). They claim, this
> gives superior performance. This alone should let us investigate this
> further. It is a requirement for a borrow checker to know when a variable
> is last used: The last use of a variable (if it is a borrow) ends the
> borrow. For a library-based borrow checker we would need to be able to know
> at compile-time when this happens. A trivial solution would be to just
> change object lifetimes from end of scope to last use. However, this is the
> wrong solution because it would break existing code like this (really short
> Qt example):
> int main(int argc, char *argv[])
> {
> QApplication app(argc,argv);
> QWidget w;
> w.show();
> return app.exec();
> }
> The last use of ‘w’ is at ‘w.show()’. But the QWidget needs to live until
> the app is quit (i.e. until the end of the scope). Similar problems arise
> if we call a function that keeps a reference to one of its arguments and a
> different function call uses this (most likely these functions would be
> member functions). The last use of the variable (locally visible to the
> compiler) could be the first function call. So, we need to stick to
> scope-based lifetimes of objects to not break existing code.
>
> Let’s look at other languages and derive some ideas.
> a) Zig uses ‘defer’ for a scope-based clean-up (instead of a destructor
> like C++). We could also introduce a keyword like ‘defer’. There could be
> an ‘eager defer’ (executed at the last use of a variable) and a ‘lazy
> defer’ (executed at the end of the scope). I’d prefer the eager defer by
> default (better for performance according to Mojo). So, we could introduce
> keywords ‘defer’ and ‘lazy_defer’/‘defer_lazy’. Currently, I know this
> feature as scope guard under C++: a class that takes a function as argument
> in its constructor and executes it in its destructor. At least the lazy
> variant could already be implemented as library instead of using a keyword
> (to be used like this: auto _ = defer([]() { std::println("end of scope");
> });). If we go for one of the other solutions, the eager variant could be
> implemented as library as well. A slight variation of this could be that we
> introduce an attribute, like [[eagerly_destroy]] (definitely needs a
> shorter name), to annotate variables that should be destroyed at their last
> use instead of the end of the scope. This attribute could also go to a
> class declaration to make each instance of that class to be eagerly
> destroyed (by default?). This would allow to annotate std::borrow with
> [[eagerly_destroy]] and have the borrow always end its lifetime at their
> last use instead of end of scope.
> b) A little less elegant would be the solution from Rust using ‘drop’.
> ‘drop’ explicitly calls the destructor of an object at a specific line of
> source code. It is less elegant because we would have to be explicit when
> the lifetime of a borrow should end. I personally don’t like this solution.
> (I always thought ‘drop’ would be a keyword, but it is a trait in Rust (
> https://doc.rust-lang.org/rust-by-example/trait/drop.html). So, we could
> actually do something similar in C++ right now. However, we would need to
> encapsulate variables inside a std::droppable<T> such that the destructor
> of the contained object is only called once (either through calling drop()
> or through the destructor of the std::droppable.))
> c) I have another idea not inspired by other languages. We could have two
> kind of destructors: One called when the object goes out of scope (just
> like now) and a second one which is called when the variable is last used.
> My initial idea is to write the new destructor as ~~MyClass() instead as
> just ~MyClass(). (There is probably a better idea than this.) I guess,
> there can be other use cases for a set of two separate destructors. I
> believe this feature would be nice to have independent of a library-based
> borrow checker. Reason: Performance (allegedly)!
>
> 2) consteval member variables
> This feature would extend compile time programming. My idea is that
> besides regular (runtime) member variables we could add variables to a
> class that are accessible during compile-time only. As such, they would not
> increase the size of a class at runtime; at runtime these are just not
> available and thus zero-overhead (both in memory space and execution time).
> These member variables would only be accessible in constant evaluated
> contexts (I guess this is what it is called in C++). Since we mark constant
> evaluated functions with the keyword ‘consteval’ I propose to also use this
> keyword for these member variables in order to not introduce a new keyword
> to the language (even though it sound nonsensical that a variable would be
> constant evaluated, but it would rather mean that the variable is only
> available in a consteval context). Again, I assume that this feature could
> be used for many other things and not just a library-based borrow checker.
>
> Why would we need this for the borrow checker? From my cursory reading (
> https://rust-book.cs.brown.edu/ch04-02-references-and-borrowing.html) of
> how borrow checking in Rust works, it keeps track of how many references
> have read, write, and ownership permissions. These could be consteval
> member variables of a class std::checked<T> which count up/down with each
> borrow: assigning to a borrow counts up and when a borrow variable is last
> used (see 1)) it counts down. It is basically like a smart (shared) pointer
> at compile time. We can use static_assert to throw an error when the borrow
> check fails.
>
> 3) Hybrid compile-time and runtime evaluation
> To make a library-based borrow checker practical we would need one more
> feature. In the beginning I mentioned to introduce two new types:
> std::checked<T> for variables that can be borrowed from and std::borrow<T>
> for the borrow itself. When assigning a checked<T> to a borrow<T> we would
> like the compiler to automatically call a compile time function (so we can
> update the consteval member variables), but also create instructions that
> execute at runtime. In order for a general C++ feature–again independent of
> a library-based borrow checker–I would like a general mechanism to allow
> for any function to be executed both at runtime and compile time.
>
> Again, I have a few different solutions to this problem:
> a) A function can have a consteval and regular version. If a function has
> a consteval version at the same time as a runtime version, it is executed
> at compile time and at the same time the code of the regular runtime
> function is also produced. This could look something like this:
>
> T& std::borrow<T>::operator=(std::checked<T> &other) { this->data =
> other.data; } // simplified regular assignment operator
> consteval T& std::borrow<T>::operator=(std::checked<T> &other) { /*
> manipulate the consteval members of std::checked<T> */ }
>
> Whenever operator= occurs in the source code we generate the code for the
> runtime operator (just as we do now), but at the same time we check if
> there is a consteval version as well and also execute that code at compile
> time. (It just occurred to me that we rather need this for the constructor
> of std::borrow<T> because it behaves like a reference (and thus cannot be
> reassigned to a different std::checked<T>).) This is most likely the
> easiest way for a user of this feature, but I am not fully convinced that
> this is a good solution.
> b) We could use a slight variation of solution a): we annotate the runtime
> version of the function with a consteval function to be called. This makes
> it more explicit that both a compile time function and a runtime function
> is called. The implicit nature of solution a) is what I don’t really like
> about that solution. The consteval function could have the same name, but
> also a totally different name (specified by the annotation). The annotation
> could be an attribute [[consteval=functionname]] or an extension of
> contracts (right besides pre- and postconditions). For the latter case it
> is just some form of precondition to first call the compile time function
> before generating the code for the runtime function (or maybe even one
> consteval function before and one after). Thinking of reflection the
> consteval function could just take a std::meta::info as argument containing
> the full assignment expression. This would yield the most information.
> c) A last solution could start from the consteval function using
> reflection. There would NOT be a consteval as well as a runtime overload.
> Instead there would ONLY be the consteval function. Within the function we
> could do whatever compile time evaluation we want to do. At the end of the
> function we would inject the actual assignment function into the token
> queue using reflection capabilities. So, basically the runtime function
> would be contained within the consteval function as reflection code.
>
> (Thinking further about this, I am not entirely sure where consteval
> function calls are currently allowed. We could certainly be in reflection
> territory.)
>
> 4) Summary
> I think with just these three additions to the language (which would have
> much wider applicability) we could move towards a library-based borrow
> checker. Use of the borrow checker would be explicit for each variable.
> This allows to choose for each variable to either use or not use the borrow
> checker. This might be especially helpful to gradually move code bases
> towards the use of a borrow checker because it doesn’t have to be used
> everywhere at once. It would be best if we could keep ABI compatibility for
> std::vector or std::string when these are wrapped into a std::checked<T>.
>
> Certainly, the timeline for a library-based borrow checker would be a lot
> longer than just implementing a borrow checker right inside the compiler.
> Maybe, we would get the three features mentioned above into C++29. Only
> then we would start experimenting with the implementation of a borrow
> checker library targeting a later standard. Instead of the Rust approach we
> could experiment with variations of the borrow checker. Other libraries
> could provide alternative solutions. Independent of that we would get three
> nice features into C++ that hopefully inspire a lot of new useful ideas
> (just as templates did for template meta programming).
>
> To sum it up: I envision a type like std::checked<T> which introduces
> variables for which to do borrow checking. A helper type std::borrow<T>
> would help with keeping track of borrows. In order to interface with legacy
> functions we could introduce member functions borrow() and unborrow() to
> std::checked<T> to manually control borrows. I think that three new
> language features would allow the implementation of a borrow checker:
> 1) A system to manage variable lifetime, i.e. the last use of a variable
> in code instead of scope-based lifetimes. We could a) use early (last use)
> and lazy (scope) defer or even an annotation (attribute) to signify a
> variable to be destructed after its last use. Other solutions include b) a
> drop() function to explicitly end the lifetime of an object before the
> scope ends or c) a second destructor ~~MyClass() which is called when the
> variable is last used. (I kind of like solution c)–maybe with a different
> syntax.)
> 2) consteval member variables. These member variables would only be
> accessible at compile time and thus do not add to the runtime size of an
> object. constexpr is already taken and has a different meaning as a
> variable specifier. consteval member variables could only be accessed from
> consteval function (or from constant evaluated contexts within a constexpr
> function). In order to avoid a new keyword we would declare variables as
> consteval. (Just to be clear: consteval member variables would not be
> constants but would be allowed to mutate.)
> 3) We need syntax for consteval functions (changing the consteval member
> variables to reflect borrowing) to be automatically called when assigning a
> std::checked<T> to a std::borrow<T> (usually in the constructor of
> std::borrow<T>). I don’t think I have the perfect solution for this, yet.
> Solution a) proposes to always call the consteval function as well as
> generate the runtime function call when a consteval and regular runtime
> function with the (otherwise) same signature exist. Alternatively, b) we
> could provide an attribute [[consteval=functionname]] to our runtime
> function (or an extension of contracts) which triggers a compile time
> function call. This consteval function could take a std::meta::info with
> the reflected expression of the original function call. Or c) we could call
> a consteval function which injects the tokens for the runtime function call
> via reflection.
>
> BTW: Since std::checked<T> and std::borrow<T> behave like references, it
> would be really nice to have operator.() in the language as well (there’s
> already a proposal for that:
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4477.pdf).
>
> On a side note: With yet another library class we could track the status
> of a variable using 2) consteval member variables and 3) hybrid compile
> time/runtime evaluation. We could track if a variable has been initialized
> before it is first read; similar to how this is tracked in Herb’s cppfront.
>
> Sorry for the very long email. I tried to keep each point short.
> Naturally, this will raise further question, but I hope to get the initial
> ideas for the three language features across. I haven’t followed each and
> every proposal (I usually just read up on trip reports and read the
> proposals mentioned there), so let me know if anything similar has already
> been proposed.
>
> I don’t have any experience with the standardization process at all. So, I
> need some support writing proposals if there is interest in further
> pursuing any of these. Any help in navigating standardization is welcome.
>
> PS: I said that this is not going to be a borrow checker proposal in
> itself. However, I have a bunch of ideas related to this which–with the
> right choices for the features above–could make a library-based borrow
> checker close to "safe by default" with an "opt-out".
>
> In the long run it could make sense to call a variable that both uses the
> borrow checker and tracks initialization status std::var<T>. By ‘using
> std::var;’ and with template argument deduction we could replace every use
> of ‘auto’ with ‘var’ (maybe introduce a safety profile that just redefines
> ‘auto’ as ’std::var’ so everything is safe by default). Through metaclasses
> we could make it easy to introduce borrow checking into std::vector,
> std::string and other classes. These classes could use a template parameter
> to opt-out of the borrow checker. This gets us closer to "safety by
> default". I don’t see how we can shorten std::borrow<T>, but we could
> introduce a conversion operator std::checked<T>::operator T&() which would
> at least count up the borrow (but don’t count down at the last use of the
> reference). Depending on the solution for hybrid compile-time/runtime
> evaluation we could replace declarations of T& with a declaration of
> std::borrow<T> (this std::borrow<T> better have operator.() for code
> compatibility). It would be nice to complement std::var<T> with a
> std::ref<T> (but that name is already taken). Real constants that don’t
> allow to cast away constness could be called std::val<T>. std::val<T> does
> not need a borrow checker (because it can never be changed, so there are
> only read-only references). I shortly thought it would need hybrid
> compile-time/runtime evaluation. However, by just not implementing an
> assignment operator we could have a std::val<T> right now.
>
> Wouldn’t it be nice if in the future we write ‘var’ instead of ‘auto’,
> ‘ref’ instead of ‘auto&’ (though this is already taken), and ‘val’ instead
> of ‘const auto’? (‘let’ instead of ‘val’ would also have a nice ring to it,
> but would look weird if someone provides the type explicitly.) All this
> would be library based and thus a lot easier to change. A specific profile
> could make all those replacements automatically. If the profile does not
> exactly replace to std::var/std::ref/std::val but var/ref/val instead,
> someone can provide their own implementation of var/ref/val and pull them
> in using ‘using’. (Actually, we’d need to differentiate between constant
> and mutable borrows. So, instead of std::ref we’d have std::mut/std::ref
> (stealing the name ‘mut’ from Rust) for ‘auto&’ and std::roref/std::cref
> (read-only ref) for ‘const auto&’.)
>
> PPS: Let me quickly introduce myself for those who are interested. I
> learned programming in the summer of 1999 using QBasic. Just a few month
> later I had my first C++ book. Since then I have learned some other
> languages for fun, but have only ever really used C++. My PhD is in
> computer graphics (scientific visualization), but my jobs so far have all
> been in scientific computing (I’ve also touched some Fortran code in these
> projects). Besides standard C++ I have quite a lot of experience with the
> Qt framework (I’m waiting for "modern Qt" just like we now have "modern
> C++"). At work, I’m currently stuck with C++17, but I’m always curious what
> the future of C++ holds.
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2025-01-11 15:22:04