Date: Sun, 31 Aug 2025 11:15:32 +0100
On Sun, 31 Aug 2025, 10:31 Levo D via Std-Proposals, <
std-proposals_at_[hidden]> wrote:
> I wrote quite a bit, so I figure I should stop and take feedback before I
> accidentally double this size.
> I'd like to hear more about things I should consider (the end gives two
> examples).
> Let me know if you'd like to work on this proposal with me. I'll be able
> to cover the content, but I doubt I can attend any meetings.
> I personally have implemented the analysis below in my compiler, so I know
> it's pretty darn simple to implement (in my non-C++ compiler)
>
> Introduction:
> Containers are used everywhere in C++ codebases; however, there's nothing
> in the type system to enforce that an index is always in bounds of a
> container. This proposal describes an optional feature that will guarantee
> an index will be in bounds for user-defined and standard containers. This
> is achieved without syntax or ABI changes. This feature will cause a lot of
> code to be an error, so users are likely to incrementally add this feature
> to files. Container correctness is out of scope.
>
> Problem Space:
>
> We'll go over a few examples to familiarize ourselves with the problem
> space. We'll show the correct implementation of each in the next section.
> You may assume Container is a std::vector or a user-defined container.
>
> Example 1
>
> The following code uses indexes from external sources.
>
> void myFunc(const Container<int>&v, int unknownAmount) {
> v[extern_lib_constant] = 0; // may be OOB
>
> for (int i=0; i<unknownAmount; i++) {
> v[i] = i; // may be OOB
> }
> }
>
>
> Example 2
>
> Here, the loop variable 'i' is checked against v1 size but not v2
>
> bool array_equals(const Container<int>&v1, const Container<int>&v2) {
> // oops, forgot: if (v1.size() != v2.size()) return false;
> for(size_t i=0; i<v1.size(); i++) {
> if (v1[i] != v2[i]) {
> return false;
> }
> }
> return true;
> }
>
> Example 3
>
> Here we use literals instead of variables. If the file is empty (or <123),
> this would be out of bounds.
>
> int checkFile(const Container<unsigned char>&file) {
> if (file[0] != 0x12 && file[2] != 0x34) // OOB
> return Error_invalid;
>
> auto interestingValue = file[123]; // OOB
> return interestingValue;
> }
>
> Example 4
>
> Here, a library upgrade may change a constant to become smaller, which
> breaks the following code.
>
> Container<int> make_buffer() {
> Container<int> v;
> v.resize(external_lib_1_literal); // external_lib_2_buf is smaller
> at the time of implementation
> mymemcpy(v, external_lib_2_buf, sizeof(external_lib_2_buf)); // OOB
> return v;
> }
>
> Static Analysis:
>
> Let's implement a solution for each example and talk about what static
> analysis would need to do
>
> Solution 1:
>
> Here is a simple bounds check that the user wrote to see if a value is too
> large; there is no lower bound check on either variable.
>
> void myFunc(const Container<int>&v, int unknownAmount) {
> if (unknownAmount > v.size() || extern_lib_constant >= v.size()) {
> return;
> }
> v[extern_lib_constant] = 0; // ok when unsigned, OOB whenever it
> holds a negative value
>
> // if unknownAmount being negative is fine since no access will
> occur
> for (int i=0; i<unknownAmount; i++) {
> v[i] = i;
> }
> }
>
> The static analyzer will need to
>
> 1. Keep a track of what containers a variable is compared with, and if
> either container or variable has been modified since.
> 'unknownAmount' and 'extern_lib_constant' are compared with 'v' in
> the if-head, inside the if-body the two
> variables are associated with 'v', one being greater, the other
> greater-than-or-equal-to.
> 2. Invert or drop the compare association at the end of an if-statement.
> When the if-body has a return, control flow won't merge into code
> outside the if-statement. In this case, we may invert the comparison.
> Since the if-statement uses ||'s to enter the if, we can deduce
> that both 'unknownAmount' and 'extern_lib_constant' failed, so we can
> associate the variable with the inverted comparison. The rest of
> the function 'unknownAmount' is '<=' while 'extern_lib_constant' is '<'
> This allows v[extern_lib_constant] to be legal, but only when the
> type is unsigned, since there was no lower bound check.
>
It's not that simple. The `extern_lib_constant >= v.size()` expression
will involve integer conversions if the type is signed.
However, because 'unknownAmount' is not '<', it can't be used in an
> array index (OOB whenever unknownAmount == v.size())
> 3. Track literal initialization and comparisons.
> 'i' is initialized with a literal, so we know the lower bound is
> 0. The lower bound will be lost if
> 'i' is modified with a non-literal value. Here 'i' is
> modified by using ++ (or += positive value), so
> It'd be impossible for 'i' to become smaller than 0
> without wrapping/UB, thus its lower bound is always >= 0
> 'i' is also compared with 'unknownAmount'. Which is associated
> with being "<= v.size()". However, since the loop uses a less-than when
> comparing to unknownAmount, the comparison is associated as
> less-than everything unknownAmount is compared with, meaning 'i' is less
> than v.size().
> Since 'i' has a positive lower bound and is known to be lower than
> 'v', the access will always be within bounds
>
> Solution 2
>
> Here, the most obvious solution is comparing the sizes, but adding '&&
> i<v2.size()' is acceptable too
>
> bool array_equals(const Container<int>&v1, const Container<int>&v2) {
> if (v1.size() != v2.size()) { return false; }
> for(size_t i=0; i<v1.size(); i++) {
> if (v1[i] != v2[i]) {
> return false;
> }
> }
> return true;
> }
>
> The static analyzer will also need to
>
> 4. Have container sizes be associated with other container sizes
> By associating v1 and v2 being equal, 'i' being compared to either
> will show it's within bounds of both
> If "v1.size() < v2.size()" was written, 'i' would still be in
> bounds; however, if the cond was "i < v2.size()"
> It wouldn't be in bounds due to not checking v1 bounds
> 5. Variables (and array sizes) must be able to associate with multiple
> containers.
> This will allow the alternative 'i<v1.size() && i<v2.size()'
> solution
>
> Solution 3:
>
> Here we use literals instead of variables.
>
> int checkFile(const Container<unsigned char>&file) {
> if (file.size() < 1234) return Error_invalid_size;
> if (file[0] != 0x12 && file[2] != 0x34)
> return Error_invalid;
>
> auto interestingValue = file[123];
> return interestingValue;
> }
>
> 6. Container sizes would be associated with literals.
> Because the analysis inverts on return, the file size must be >=
> 1234, which allows every literal in this function
>
> Solution 4
>
> Here, our 'solution' is to err so we can fix this. We would want an error
> when external_lib_1_literal becomes smaller than external_lib_2_buf
>
> Container<int> make_buffer() {
> Container<int> v;
> v.resize(external_lib_1_literal); // external_lib_2_buf is smaller
> at the time of implementation
> // mymemcpy(Container<int>&dst, const Container<int>&src) pre {
> dst.size() >= src.size() }
> mymemcpy(v, external_lib_2_buf);
> return v;
> }
>
> 7. Container is resized to a known literal, which gives us v's size. If
> external_lib_2_buf is constant, we'd
> have enough information for 'mymemcpy' precondition
> 8. Preconditions would need to be supported to allow an error at mymemcpy
> rather than (possibly silently) failing elsewhere.
>
> Attributes For Containers:
>
> Static Analysis shouldn't guess if a container needs bounds checking and
> what each function does or does not do.
> As an example, a map<int, int> would have a size function and subscript
> operator, but it should not have bounds checking.
>
> Below are attributes a container can use so static analysis can provide
> easy-to-use bounds checking.
> Assume "bounds_" is prefixed to each of these names; "bounds" might be the
> most obvious prefix word
> - subscript(i), this explicitly states that the parameter index i should
> be bounds checked by the static analyzer.
> - size, this function returns the size. Return type may be any integer
> - inc_one/inc_many, variables known to be less-than will remain less-than
> - dec_one/dec_many/invalidate, size is no longer known, must compare again.
> - noop, bounds is not modified
> - resize(i), the size is exactly the size specified in the parameter
> - size(n), this function may not be called unless the size is at least n.
> This is useful for functions such as first() or last() where the
> container needs at least 1 element.
> - dup, this is when a container wants to produce another container with
> the same bounds (like a clone)
> - move, there might be a rare case where a user wants to move an object
> without using a move construct/assignment
>
> Any non-const function that doesn't have a bounds_ attribute and isn't
> private is assumed to invalidate the size.
> However, an error might be preferable.
> Move and copy constructor/assignment can assume the size becomes the size
> of the object being moved/copied.
>
> Here's an example container with the attributes
>
> template<class T> class UserArray
> {
> T*ptr;
> int size_, capacity;
> void grow() __attribute__((bounds_noop)); // increases capacity
> but size is unaffected
> public:
> UserArray();
> void clear() __attribute__((bounds_invalidate)); // empties the
> array
> int clearAllMemory() __attribute__((bounds_noop)); // elements are
> set to zero, but the container did not grow or shrink
> UserArray& operator=(UserArray&&o); // may assume this object has
> the same bounds as o after this call.
> UserArray clone() const __attribute__((bounds_dup));
> void filter() __attribute__((bounds_invalidate)); // may be an
> empty after this call
> void push(const T& v) __attribute__((bounds_inc_one));
> void pop(const T& v) __attribute__((bounds_dec_one));
> void resize(size_t size) __attribute__((bounds_resize, 2));
> size_t size() const __attribute__((bounds_size));
> void append(const UserArray&arr) __attribute__((bounds_inc_many));
> T& first() __attribute__((bounds_size(1)));
> };
>
> Other considerations:
>
> Static analysis would need to understand the 'min' function.
>
> // return the amount of sequentially equal elements
> int different_array_equals(const Container<int>&a, const Container<int>&b)
> {
> for(size_t i=0; i<min(a.size(), b.size()); i++) {
> if (v1[i] != v2[i]) {
> return i;
> }
> }
> return 0;
> }
>
> Code using a function to find an upper bound would not be understood as
> smaller than the input.
> As an example, Sieve of Eratosthenes uses a sqrt as the for-loop upper
> bound.
> https://rosettacode.org/wiki/Sieve_of_Eratosthenes#C
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
std-proposals_at_[hidden]> wrote:
> I wrote quite a bit, so I figure I should stop and take feedback before I
> accidentally double this size.
> I'd like to hear more about things I should consider (the end gives two
> examples).
> Let me know if you'd like to work on this proposal with me. I'll be able
> to cover the content, but I doubt I can attend any meetings.
> I personally have implemented the analysis below in my compiler, so I know
> it's pretty darn simple to implement (in my non-C++ compiler)
>
> Introduction:
> Containers are used everywhere in C++ codebases; however, there's nothing
> in the type system to enforce that an index is always in bounds of a
> container. This proposal describes an optional feature that will guarantee
> an index will be in bounds for user-defined and standard containers. This
> is achieved without syntax or ABI changes. This feature will cause a lot of
> code to be an error, so users are likely to incrementally add this feature
> to files. Container correctness is out of scope.
>
> Problem Space:
>
> We'll go over a few examples to familiarize ourselves with the problem
> space. We'll show the correct implementation of each in the next section.
> You may assume Container is a std::vector or a user-defined container.
>
> Example 1
>
> The following code uses indexes from external sources.
>
> void myFunc(const Container<int>&v, int unknownAmount) {
> v[extern_lib_constant] = 0; // may be OOB
>
> for (int i=0; i<unknownAmount; i++) {
> v[i] = i; // may be OOB
> }
> }
>
>
> Example 2
>
> Here, the loop variable 'i' is checked against v1 size but not v2
>
> bool array_equals(const Container<int>&v1, const Container<int>&v2) {
> // oops, forgot: if (v1.size() != v2.size()) return false;
> for(size_t i=0; i<v1.size(); i++) {
> if (v1[i] != v2[i]) {
> return false;
> }
> }
> return true;
> }
>
> Example 3
>
> Here we use literals instead of variables. If the file is empty (or <123),
> this would be out of bounds.
>
> int checkFile(const Container<unsigned char>&file) {
> if (file[0] != 0x12 && file[2] != 0x34) // OOB
> return Error_invalid;
>
> auto interestingValue = file[123]; // OOB
> return interestingValue;
> }
>
> Example 4
>
> Here, a library upgrade may change a constant to become smaller, which
> breaks the following code.
>
> Container<int> make_buffer() {
> Container<int> v;
> v.resize(external_lib_1_literal); // external_lib_2_buf is smaller
> at the time of implementation
> mymemcpy(v, external_lib_2_buf, sizeof(external_lib_2_buf)); // OOB
> return v;
> }
>
> Static Analysis:
>
> Let's implement a solution for each example and talk about what static
> analysis would need to do
>
> Solution 1:
>
> Here is a simple bounds check that the user wrote to see if a value is too
> large; there is no lower bound check on either variable.
>
> void myFunc(const Container<int>&v, int unknownAmount) {
> if (unknownAmount > v.size() || extern_lib_constant >= v.size()) {
> return;
> }
> v[extern_lib_constant] = 0; // ok when unsigned, OOB whenever it
> holds a negative value
>
> // if unknownAmount being negative is fine since no access will
> occur
> for (int i=0; i<unknownAmount; i++) {
> v[i] = i;
> }
> }
>
> The static analyzer will need to
>
> 1. Keep a track of what containers a variable is compared with, and if
> either container or variable has been modified since.
> 'unknownAmount' and 'extern_lib_constant' are compared with 'v' in
> the if-head, inside the if-body the two
> variables are associated with 'v', one being greater, the other
> greater-than-or-equal-to.
> 2. Invert or drop the compare association at the end of an if-statement.
> When the if-body has a return, control flow won't merge into code
> outside the if-statement. In this case, we may invert the comparison.
> Since the if-statement uses ||'s to enter the if, we can deduce
> that both 'unknownAmount' and 'extern_lib_constant' failed, so we can
> associate the variable with the inverted comparison. The rest of
> the function 'unknownAmount' is '<=' while 'extern_lib_constant' is '<'
> This allows v[extern_lib_constant] to be legal, but only when the
> type is unsigned, since there was no lower bound check.
>
It's not that simple. The `extern_lib_constant >= v.size()` expression
will involve integer conversions if the type is signed.
However, because 'unknownAmount' is not '<', it can't be used in an
> array index (OOB whenever unknownAmount == v.size())
> 3. Track literal initialization and comparisons.
> 'i' is initialized with a literal, so we know the lower bound is
> 0. The lower bound will be lost if
> 'i' is modified with a non-literal value. Here 'i' is
> modified by using ++ (or += positive value), so
> It'd be impossible for 'i' to become smaller than 0
> without wrapping/UB, thus its lower bound is always >= 0
> 'i' is also compared with 'unknownAmount'. Which is associated
> with being "<= v.size()". However, since the loop uses a less-than when
> comparing to unknownAmount, the comparison is associated as
> less-than everything unknownAmount is compared with, meaning 'i' is less
> than v.size().
> Since 'i' has a positive lower bound and is known to be lower than
> 'v', the access will always be within bounds
>
> Solution 2
>
> Here, the most obvious solution is comparing the sizes, but adding '&&
> i<v2.size()' is acceptable too
>
> bool array_equals(const Container<int>&v1, const Container<int>&v2) {
> if (v1.size() != v2.size()) { return false; }
> for(size_t i=0; i<v1.size(); i++) {
> if (v1[i] != v2[i]) {
> return false;
> }
> }
> return true;
> }
>
> The static analyzer will also need to
>
> 4. Have container sizes be associated with other container sizes
> By associating v1 and v2 being equal, 'i' being compared to either
> will show it's within bounds of both
> If "v1.size() < v2.size()" was written, 'i' would still be in
> bounds; however, if the cond was "i < v2.size()"
> It wouldn't be in bounds due to not checking v1 bounds
> 5. Variables (and array sizes) must be able to associate with multiple
> containers.
> This will allow the alternative 'i<v1.size() && i<v2.size()'
> solution
>
> Solution 3:
>
> Here we use literals instead of variables.
>
> int checkFile(const Container<unsigned char>&file) {
> if (file.size() < 1234) return Error_invalid_size;
> if (file[0] != 0x12 && file[2] != 0x34)
> return Error_invalid;
>
> auto interestingValue = file[123];
> return interestingValue;
> }
>
> 6. Container sizes would be associated with literals.
> Because the analysis inverts on return, the file size must be >=
> 1234, which allows every literal in this function
>
> Solution 4
>
> Here, our 'solution' is to err so we can fix this. We would want an error
> when external_lib_1_literal becomes smaller than external_lib_2_buf
>
> Container<int> make_buffer() {
> Container<int> v;
> v.resize(external_lib_1_literal); // external_lib_2_buf is smaller
> at the time of implementation
> // mymemcpy(Container<int>&dst, const Container<int>&src) pre {
> dst.size() >= src.size() }
> mymemcpy(v, external_lib_2_buf);
> return v;
> }
>
> 7. Container is resized to a known literal, which gives us v's size. If
> external_lib_2_buf is constant, we'd
> have enough information for 'mymemcpy' precondition
> 8. Preconditions would need to be supported to allow an error at mymemcpy
> rather than (possibly silently) failing elsewhere.
>
> Attributes For Containers:
>
> Static Analysis shouldn't guess if a container needs bounds checking and
> what each function does or does not do.
> As an example, a map<int, int> would have a size function and subscript
> operator, but it should not have bounds checking.
>
> Below are attributes a container can use so static analysis can provide
> easy-to-use bounds checking.
> Assume "bounds_" is prefixed to each of these names; "bounds" might be the
> most obvious prefix word
> - subscript(i), this explicitly states that the parameter index i should
> be bounds checked by the static analyzer.
> - size, this function returns the size. Return type may be any integer
> - inc_one/inc_many, variables known to be less-than will remain less-than
> - dec_one/dec_many/invalidate, size is no longer known, must compare again.
> - noop, bounds is not modified
> - resize(i), the size is exactly the size specified in the parameter
> - size(n), this function may not be called unless the size is at least n.
> This is useful for functions such as first() or last() where the
> container needs at least 1 element.
> - dup, this is when a container wants to produce another container with
> the same bounds (like a clone)
> - move, there might be a rare case where a user wants to move an object
> without using a move construct/assignment
>
> Any non-const function that doesn't have a bounds_ attribute and isn't
> private is assumed to invalidate the size.
> However, an error might be preferable.
> Move and copy constructor/assignment can assume the size becomes the size
> of the object being moved/copied.
>
> Here's an example container with the attributes
>
> template<class T> class UserArray
> {
> T*ptr;
> int size_, capacity;
> void grow() __attribute__((bounds_noop)); // increases capacity
> but size is unaffected
> public:
> UserArray();
> void clear() __attribute__((bounds_invalidate)); // empties the
> array
> int clearAllMemory() __attribute__((bounds_noop)); // elements are
> set to zero, but the container did not grow or shrink
> UserArray& operator=(UserArray&&o); // may assume this object has
> the same bounds as o after this call.
> UserArray clone() const __attribute__((bounds_dup));
> void filter() __attribute__((bounds_invalidate)); // may be an
> empty after this call
> void push(const T& v) __attribute__((bounds_inc_one));
> void pop(const T& v) __attribute__((bounds_dec_one));
> void resize(size_t size) __attribute__((bounds_resize, 2));
> size_t size() const __attribute__((bounds_size));
> void append(const UserArray&arr) __attribute__((bounds_inc_many));
> T& first() __attribute__((bounds_size(1)));
> };
>
> Other considerations:
>
> Static analysis would need to understand the 'min' function.
>
> // return the amount of sequentially equal elements
> int different_array_equals(const Container<int>&a, const Container<int>&b)
> {
> for(size_t i=0; i<min(a.size(), b.size()); i++) {
> if (v1[i] != v2[i]) {
> return i;
> }
> }
> return 0;
> }
>
> Code using a function to find an upper bound would not be understood as
> smaller than the input.
> As an example, Sieve of Eratosthenes uses a sqrt as the for-loop upper
> bound.
> https://rosettacode.org/wiki/Sieve_of_Eratosthenes#C
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2025-08-31 10:15:52