Date: Sun, 31 Aug 2025 09:31:40 +0000
I wrote quite a bit, so I figure I should stop and take feedback before I accidentally double this size.
I'd like to hear more about things I should consider (the end gives two examples).
Let me know if you'd like to work on this proposal with me. I'll be able to cover the content, but I doubt I can attend any meetings.
I personally have implemented the analysis below in my compiler, so I know it's pretty darn simple to implement (in my non-C++ compiler)
Introduction:
Containers are used everywhere in C++ codebases; however, there's nothing in the type system to enforce that an index is always in bounds of a container. This proposal describes an optional feature that will guarantee an index will be in bounds for user-defined and standard containers. This is achieved without syntax or ABI changes. This feature will cause a lot of code to be an error, so users are likely to incrementally add this feature to files. Container correctness is out of scope.
Problem Space:
We'll go over a few examples to familiarize ourselves with the problem space. We'll show the correct implementation of each in the next section.
You may assume Container is a std::vector or a user-defined container.
Example 1
The following code uses indexes from external sources.
void myFunc(const Container<int>&v, int unknownAmount) {
v[extern_lib_constant] = 0; // may be OOB
for (int i=0; i<unknownAmount; i++) {
v[i] = i; // may be OOB
}
}
Example 2
Here, the loop variable 'i' is checked against v1 size but not v2
bool array_equals(const Container<int>&v1, const Container<int>&v2) {
// oops, forgot: if (v1.size() != v2.size()) return false;
for(size_t i=0; i<v1.size(); i++) {
if (v1[i] != v2[i]) {
return false;
}
}
return true;
}
Example 3
Here we use literals instead of variables. If the file is empty (or <123), this would be out of bounds.
int checkFile(const Container<unsigned char>&file) {
if (file[0] != 0x12 && file[2] != 0x34) // OOB
return Error_invalid;
auto interestingValue = file[123]; // OOB
return interestingValue;
}
Example 4
Here, a library upgrade may change a constant to become smaller, which breaks the following code.
Container<int> make_buffer() {
Container<int> v;
v.resize(external_lib_1_literal); // external_lib_2_buf is smaller at the time of implementation
mymemcpy(v, external_lib_2_buf, sizeof(external_lib_2_buf)); // OOB
return v;
}
Static Analysis:
Let's implement a solution for each example and talk about what static analysis would need to do
Solution 1:
Here is a simple bounds check that the user wrote to see if a value is too large; there is no lower bound check on either variable.
void myFunc(const Container<int>&v, int unknownAmount) {
if (unknownAmount > v.size() || extern_lib_constant >= v.size()) {
return;
}
v[extern_lib_constant] = 0; // ok when unsigned, OOB whenever it holds a negative value
// if unknownAmount being negative is fine since no access will occur
for (int i=0; i<unknownAmount; i++) {
v[i] = i;
}
}
The static analyzer will need to
1. Keep a track of what containers a variable is compared with, and if either container or variable has been modified since.
'unknownAmount' and 'extern_lib_constant' are compared with 'v' in the if-head, inside the if-body the two
variables are associated with 'v', one being greater, the other greater-than-or-equal-to.
2. Invert or drop the compare association at the end of an if-statement.
When the if-body has a return, control flow won't merge into code outside the if-statement. In this case, we may invert the comparison.
Since the if-statement uses ||'s to enter the if, we can deduce that both 'unknownAmount' and 'extern_lib_constant' failed, so we can
associate the variable with the inverted comparison. The rest of the function 'unknownAmount' is '<=' while 'extern_lib_constant' is '<'
This allows v[extern_lib_constant] to be legal, but only when the type is unsigned, since there was no lower bound check.
However, because 'unknownAmount' is not '<', it can't be used in an array index (OOB whenever unknownAmount == v.size())
3. Track literal initialization and comparisons.
'i' is initialized with a literal, so we know the lower bound is 0. The lower bound will be lost if
'i' is modified with a non-literal value. Here 'i' is modified by using ++ (or += positive value), so
It'd be impossible for 'i' to become smaller than 0 without wrapping/UB, thus its lower bound is always >= 0
'i' is also compared with 'unknownAmount'. Which is associated with being "<= v.size()". However, since the loop uses a less-than when
comparing to unknownAmount, the comparison is associated as less-than everything unknownAmount is compared with, meaning 'i' is less than v.size().
Since 'i' has a positive lower bound and is known to be lower than 'v', the access will always be within bounds
Solution 2
Here, the most obvious solution is comparing the sizes, but adding '&& i<v2.size()' is acceptable too
bool array_equals(const Container<int>&v1, const Container<int>&v2) {
if (v1.size() != v2.size()) { return false; }
for(size_t i=0; i<v1.size(); i++) {
if (v1[i] != v2[i]) {
return false;
}
}
return true;
}
The static analyzer will also need to
4. Have container sizes be associated with other container sizes
By associating v1 and v2 being equal, 'i' being compared to either will show it's within bounds of both
If "v1.size() < v2.size()" was written, 'i' would still be in bounds; however, if the cond was "i < v2.size()"
It wouldn't be in bounds due to not checking v1 bounds
5. Variables (and array sizes) must be able to associate with multiple containers.
This will allow the alternative 'i<v1.size() && i<v2.size()' solution
Solution 3:
Here we use literals instead of variables.
int checkFile(const Container<unsigned char>&file) {
if (file.size() < 1234) return Error_invalid_size;
if (file[0] != 0x12 && file[2] != 0x34)
return Error_invalid;
auto interestingValue = file[123];
return interestingValue;
}
6. Container sizes would be associated with literals.
Because the analysis inverts on return, the file size must be >= 1234, which allows every literal in this function
Solution 4
Here, our 'solution' is to err so we can fix this. We would want an error when external_lib_1_literal becomes smaller than external_lib_2_buf
Container<int> make_buffer() {
Container<int> v;
v.resize(external_lib_1_literal); // external_lib_2_buf is smaller at the time of implementation
// mymemcpy(Container<int>&dst, const Container<int>&src) pre { dst.size() >= src.size() }
mymemcpy(v, external_lib_2_buf);
return v;
}
7. Container is resized to a known literal, which gives us v's size. If external_lib_2_buf is constant, we'd
have enough information for 'mymemcpy' precondition
8. Preconditions would need to be supported to allow an error at mymemcpy rather than (possibly silently) failing elsewhere.
Attributes For Containers:
Static Analysis shouldn't guess if a container needs bounds checking and what each function does or does not do.
As an example, a map<int, int> would have a size function and subscript operator, but it should not have bounds checking.
Below are attributes a container can use so static analysis can provide easy-to-use bounds checking.
Assume "bounds_" is prefixed to each of these names; "bounds" might be the most obvious prefix word
- subscript(i), this explicitly states that the parameter index i should be bounds checked by the static analyzer.
- size, this function returns the size. Return type may be any integer
- inc_one/inc_many, variables known to be less-than will remain less-than
- dec_one/dec_many/invalidate, size is no longer known, must compare again.
- noop, bounds is not modified
- resize(i), the size is exactly the size specified in the parameter
- size(n), this function may not be called unless the size is at least n.
This is useful for functions such as first() or last() where the container needs at least 1 element.
- dup, this is when a container wants to produce another container with the same bounds (like a clone)
- move, there might be a rare case where a user wants to move an object without using a move construct/assignment
Any non-const function that doesn't have a bounds_ attribute and isn't private is assumed to invalidate the size.
However, an error might be preferable.
Move and copy constructor/assignment can assume the size becomes the size of the object being moved/copied.
Here's an example container with the attributes
template<class T> class UserArray
{
T*ptr;
int size_, capacity;
void grow() __attribute__((bounds_noop)); // increases capacity but size is unaffected
public:
UserArray();
void clear() __attribute__((bounds_invalidate)); // empties the array
int clearAllMemory() __attribute__((bounds_noop)); // elements are set to zero, but the container did not grow or shrink
UserArray& operator=(UserArray&&o); // may assume this object has the same bounds as o after this call.
UserArray clone() const __attribute__((bounds_dup));
void filter() __attribute__((bounds_invalidate)); // may be an empty after this call
void push(const T& v) __attribute__((bounds_inc_one));
void pop(const T& v) __attribute__((bounds_dec_one));
void resize(size_t size) __attribute__((bounds_resize, 2));
size_t size() const __attribute__((bounds_size));
void append(const UserArray&arr) __attribute__((bounds_inc_many));
T& first() __attribute__((bounds_size(1)));
};
Other considerations:
Static analysis would need to understand the 'min' function.
// return the amount of sequentially equal elements
int different_array_equals(const Container<int>&a, const Container<int>&b) {
for(size_t i=0; i<min(a.size(), b.size()); i++) {
if (v1[i] != v2[i]) {
return i;
}
}
return 0;
}
Code using a function to find an upper bound would not be understood as smaller than the input.
As an example, Sieve of Eratosthenes uses a sqrt as the for-loop upper bound.
https://rosettacode.org/wiki/Sieve_of_Eratosthenes#C
I'd like to hear more about things I should consider (the end gives two examples).
Let me know if you'd like to work on this proposal with me. I'll be able to cover the content, but I doubt I can attend any meetings.
I personally have implemented the analysis below in my compiler, so I know it's pretty darn simple to implement (in my non-C++ compiler)
Introduction:
Containers are used everywhere in C++ codebases; however, there's nothing in the type system to enforce that an index is always in bounds of a container. This proposal describes an optional feature that will guarantee an index will be in bounds for user-defined and standard containers. This is achieved without syntax or ABI changes. This feature will cause a lot of code to be an error, so users are likely to incrementally add this feature to files. Container correctness is out of scope.
Problem Space:
We'll go over a few examples to familiarize ourselves with the problem space. We'll show the correct implementation of each in the next section.
You may assume Container is a std::vector or a user-defined container.
Example 1
The following code uses indexes from external sources.
void myFunc(const Container<int>&v, int unknownAmount) {
v[extern_lib_constant] = 0; // may be OOB
for (int i=0; i<unknownAmount; i++) {
v[i] = i; // may be OOB
}
}
Example 2
Here, the loop variable 'i' is checked against v1 size but not v2
bool array_equals(const Container<int>&v1, const Container<int>&v2) {
// oops, forgot: if (v1.size() != v2.size()) return false;
for(size_t i=0; i<v1.size(); i++) {
if (v1[i] != v2[i]) {
return false;
}
}
return true;
}
Example 3
Here we use literals instead of variables. If the file is empty (or <123), this would be out of bounds.
int checkFile(const Container<unsigned char>&file) {
if (file[0] != 0x12 && file[2] != 0x34) // OOB
return Error_invalid;
auto interestingValue = file[123]; // OOB
return interestingValue;
}
Example 4
Here, a library upgrade may change a constant to become smaller, which breaks the following code.
Container<int> make_buffer() {
Container<int> v;
v.resize(external_lib_1_literal); // external_lib_2_buf is smaller at the time of implementation
mymemcpy(v, external_lib_2_buf, sizeof(external_lib_2_buf)); // OOB
return v;
}
Static Analysis:
Let's implement a solution for each example and talk about what static analysis would need to do
Solution 1:
Here is a simple bounds check that the user wrote to see if a value is too large; there is no lower bound check on either variable.
void myFunc(const Container<int>&v, int unknownAmount) {
if (unknownAmount > v.size() || extern_lib_constant >= v.size()) {
return;
}
v[extern_lib_constant] = 0; // ok when unsigned, OOB whenever it holds a negative value
// if unknownAmount being negative is fine since no access will occur
for (int i=0; i<unknownAmount; i++) {
v[i] = i;
}
}
The static analyzer will need to
1. Keep a track of what containers a variable is compared with, and if either container or variable has been modified since.
'unknownAmount' and 'extern_lib_constant' are compared with 'v' in the if-head, inside the if-body the two
variables are associated with 'v', one being greater, the other greater-than-or-equal-to.
2. Invert or drop the compare association at the end of an if-statement.
When the if-body has a return, control flow won't merge into code outside the if-statement. In this case, we may invert the comparison.
Since the if-statement uses ||'s to enter the if, we can deduce that both 'unknownAmount' and 'extern_lib_constant' failed, so we can
associate the variable with the inverted comparison. The rest of the function 'unknownAmount' is '<=' while 'extern_lib_constant' is '<'
This allows v[extern_lib_constant] to be legal, but only when the type is unsigned, since there was no lower bound check.
However, because 'unknownAmount' is not '<', it can't be used in an array index (OOB whenever unknownAmount == v.size())
3. Track literal initialization and comparisons.
'i' is initialized with a literal, so we know the lower bound is 0. The lower bound will be lost if
'i' is modified with a non-literal value. Here 'i' is modified by using ++ (or += positive value), so
It'd be impossible for 'i' to become smaller than 0 without wrapping/UB, thus its lower bound is always >= 0
'i' is also compared with 'unknownAmount'. Which is associated with being "<= v.size()". However, since the loop uses a less-than when
comparing to unknownAmount, the comparison is associated as less-than everything unknownAmount is compared with, meaning 'i' is less than v.size().
Since 'i' has a positive lower bound and is known to be lower than 'v', the access will always be within bounds
Solution 2
Here, the most obvious solution is comparing the sizes, but adding '&& i<v2.size()' is acceptable too
bool array_equals(const Container<int>&v1, const Container<int>&v2) {
if (v1.size() != v2.size()) { return false; }
for(size_t i=0; i<v1.size(); i++) {
if (v1[i] != v2[i]) {
return false;
}
}
return true;
}
The static analyzer will also need to
4. Have container sizes be associated with other container sizes
By associating v1 and v2 being equal, 'i' being compared to either will show it's within bounds of both
If "v1.size() < v2.size()" was written, 'i' would still be in bounds; however, if the cond was "i < v2.size()"
It wouldn't be in bounds due to not checking v1 bounds
5. Variables (and array sizes) must be able to associate with multiple containers.
This will allow the alternative 'i<v1.size() && i<v2.size()' solution
Solution 3:
Here we use literals instead of variables.
int checkFile(const Container<unsigned char>&file) {
if (file.size() < 1234) return Error_invalid_size;
if (file[0] != 0x12 && file[2] != 0x34)
return Error_invalid;
auto interestingValue = file[123];
return interestingValue;
}
6. Container sizes would be associated with literals.
Because the analysis inverts on return, the file size must be >= 1234, which allows every literal in this function
Solution 4
Here, our 'solution' is to err so we can fix this. We would want an error when external_lib_1_literal becomes smaller than external_lib_2_buf
Container<int> make_buffer() {
Container<int> v;
v.resize(external_lib_1_literal); // external_lib_2_buf is smaller at the time of implementation
// mymemcpy(Container<int>&dst, const Container<int>&src) pre { dst.size() >= src.size() }
mymemcpy(v, external_lib_2_buf);
return v;
}
7. Container is resized to a known literal, which gives us v's size. If external_lib_2_buf is constant, we'd
have enough information for 'mymemcpy' precondition
8. Preconditions would need to be supported to allow an error at mymemcpy rather than (possibly silently) failing elsewhere.
Attributes For Containers:
Static Analysis shouldn't guess if a container needs bounds checking and what each function does or does not do.
As an example, a map<int, int> would have a size function and subscript operator, but it should not have bounds checking.
Below are attributes a container can use so static analysis can provide easy-to-use bounds checking.
Assume "bounds_" is prefixed to each of these names; "bounds" might be the most obvious prefix word
- subscript(i), this explicitly states that the parameter index i should be bounds checked by the static analyzer.
- size, this function returns the size. Return type may be any integer
- inc_one/inc_many, variables known to be less-than will remain less-than
- dec_one/dec_many/invalidate, size is no longer known, must compare again.
- noop, bounds is not modified
- resize(i), the size is exactly the size specified in the parameter
- size(n), this function may not be called unless the size is at least n.
This is useful for functions such as first() or last() where the container needs at least 1 element.
- dup, this is when a container wants to produce another container with the same bounds (like a clone)
- move, there might be a rare case where a user wants to move an object without using a move construct/assignment
Any non-const function that doesn't have a bounds_ attribute and isn't private is assumed to invalidate the size.
However, an error might be preferable.
Move and copy constructor/assignment can assume the size becomes the size of the object being moved/copied.
Here's an example container with the attributes
template<class T> class UserArray
{
T*ptr;
int size_, capacity;
void grow() __attribute__((bounds_noop)); // increases capacity but size is unaffected
public:
UserArray();
void clear() __attribute__((bounds_invalidate)); // empties the array
int clearAllMemory() __attribute__((bounds_noop)); // elements are set to zero, but the container did not grow or shrink
UserArray& operator=(UserArray&&o); // may assume this object has the same bounds as o after this call.
UserArray clone() const __attribute__((bounds_dup));
void filter() __attribute__((bounds_invalidate)); // may be an empty after this call
void push(const T& v) __attribute__((bounds_inc_one));
void pop(const T& v) __attribute__((bounds_dec_one));
void resize(size_t size) __attribute__((bounds_resize, 2));
size_t size() const __attribute__((bounds_size));
void append(const UserArray&arr) __attribute__((bounds_inc_many));
T& first() __attribute__((bounds_size(1)));
};
Other considerations:
Static analysis would need to understand the 'min' function.
// return the amount of sequentially equal elements
int different_array_equals(const Container<int>&a, const Container<int>&b) {
for(size_t i=0; i<min(a.size(), b.size()); i++) {
if (v1[i] != v2[i]) {
return i;
}
}
return 0;
}
Code using a function to find an upper bound would not be understood as smaller than the input.
As an example, Sieve of Eratosthenes uses a sqrt as the for-loop upper bound.
https://rosettacode.org/wiki/Sieve_of_Eratosthenes#C
Received on 2025-08-31 09:31:42