Date: Fri, 22 Aug 2025 20:04:44 +0000
I'd be happy to work on this proposal with others.
This document is about bound-checking on arrays, not containers,
which can be added in the future after a successful implementation.
If we look at the following program, we'll notice several problems. Comments will explicitly explain them.
The two biggest issues are
1) no easy way to convert a large array into a smaller array
2) Compilers don't need to error when the size is known and a literal index is outside of it
#include <cstddef>
void test16(char (&arr)[16]) { arr[15] = 0x12; }
void test32(char (&arr)[32]) { arr[31] = 0x34; }
// Sanitizers won't catch this if you change 257 to 255
void test256(char (&arr)[256]) { arr[257] = 0x56; }
template <size_t N> void testN(char (&arr)[N]) {
//test16(arr); // error because size is not exact
test32(arr); // exact, compiles
// Typecasting is bad, as we know
test16(reinterpret_cast<char (&)[16]>(arr)); // compiles
test256(reinterpret_cast<char (&)[256]>(arr)); // compiles and will overwrite memory
}
int main() {
char buf[32]{};
testN(buf); // two problems, 32 is smaller than 256 and
// test256 writes to 257, which is clearly out of range
// the below doesn't cause a warning (or error) in some compilers
buf[-1] = 0x78;
// I much rather the previous line be written as
*(buf-1) = 0x78;
}
By having 'arr[257]' and 'buf[-1]' become an error, obvious mistakes will be caught immediately.
To allow testN to be implemented without casting, I suggest a syntax that accepts an index and length
Here are a few
arr[I .. LEN] // may be confusing, but I think it is very readable
arr[I length LEN] // length would be a contextual keyword
arr[I, LEN] // may interfere with existing code that uses the comma operator
arr[I : LEN] // may be confusing as a start and end rather than start and length
Then we may implement testN as something like
void testN(char (&arr)[128]) {
test16(arr[0 .. 16]); // ok now
test16(arr[..]); // ok as long as array is big enough
test16(arr[10 ..]); // starts at index 10 takes an implicit amount of length
// 128-10 would allow converting arrays to any size up to 118
test32(arr[.. 32]); // no longer exact since we changed this function signature
test256(arr[..]); // errors, 256>128
}
The next problem is using a variable as an index. Since a lot of code would break, this should be opt-in.
I have implemented the below in a compiler, so I understand the complexity of each, but the compiler wasn't a C++ compiler.
// Extremely simple
void myfunc(char (&arr)[256]) {
// loop variable, positive and less than array size
for(size_t i=0; i<256; i++) {
doSomething(arr[i]);
}
}
// Fairly simple
void myfunc(char (&arr)[256], int n) {
if (n >= 0 && n < 128) { // Note 128 is smaller than array length
doSomething(arr[n]);
}
}
// Nearly as simple
void myfunc(char (&arr)[256], int n) {
if (n < 0 || n >= 128)
return;
// At this point n must be a positive, and must be smaller than the array length
doSomething(arr[n]);
}
// Medium to hard, I'm not sure if this is common enough to support this
// The issue here is n is mutated after the check
void myfunc(char (&arr)[256], int n) {
if (n < 0 || n >= 254)
return;
doSomething(arr[++n]);
doSomething(arr[++n]);
}
Having the above makes handling binary simpler.
Let's highlight the difference by using span. There will be assert(s), but no compile-time error.
#include <span>
int test1(std::span<int> file, int n) {
if (file[0] != 0x12 && file[2] != 0x34)
return -1;
return file[n] + file[100];
}
int test2(std::span<int, 32> file, int n) {
if (file[0] != 0x12 && file[2] != 0x34)
return -1;
return file[n] + file[100];
}
int main() {
int tooSmall[]={1};
int fileBadHeader[64]={1,2,3,4,5,6,7,8,9};
int fileOkHeader[64]={0x12,0x34,3,4,5,6,7,8,9};
//test1({tooSmall}, 100); // will assert
test1({fileBadHeader}, 100); // will not assert
//test1({fileOkHeader}, 100); // will assert
test2(std::span{fileBadHeader}.subspan<10, 32>(), 100); // will not assert
//test2(std::span{fileOkHeader}.subspan<0, 32>(), 100); // will assert
}
By using arrays, we can catch indexing 100 and the problematic index n if the user opts into the additional checks.
With additional support for parameters, we would be able to catch the error below.
void myFunc(std::vector<char>&v) {
#define LIB_BUF_LEN 16
auto buf = v.make_push(LIB_BUF_LEN);
// image make_push signature being: T(&)[count] make_push(constexpr size_t count);
for(int i=0; i<32; i++) { // LIB_BUF_LEN is smaller, this the array reference is smaller
buf[i] = i; // oops, out of range found at compile time
}
}
This document is about bound-checking on arrays, not containers,
which can be added in the future after a successful implementation.
If we look at the following program, we'll notice several problems. Comments will explicitly explain them.
The two biggest issues are
1) no easy way to convert a large array into a smaller array
2) Compilers don't need to error when the size is known and a literal index is outside of it
#include <cstddef>
void test16(char (&arr)[16]) { arr[15] = 0x12; }
void test32(char (&arr)[32]) { arr[31] = 0x34; }
// Sanitizers won't catch this if you change 257 to 255
void test256(char (&arr)[256]) { arr[257] = 0x56; }
template <size_t N> void testN(char (&arr)[N]) {
//test16(arr); // error because size is not exact
test32(arr); // exact, compiles
// Typecasting is bad, as we know
test16(reinterpret_cast<char (&)[16]>(arr)); // compiles
test256(reinterpret_cast<char (&)[256]>(arr)); // compiles and will overwrite memory
}
int main() {
char buf[32]{};
testN(buf); // two problems, 32 is smaller than 256 and
// test256 writes to 257, which is clearly out of range
// the below doesn't cause a warning (or error) in some compilers
buf[-1] = 0x78;
// I much rather the previous line be written as
*(buf-1) = 0x78;
}
By having 'arr[257]' and 'buf[-1]' become an error, obvious mistakes will be caught immediately.
To allow testN to be implemented without casting, I suggest a syntax that accepts an index and length
Here are a few
arr[I .. LEN] // may be confusing, but I think it is very readable
arr[I length LEN] // length would be a contextual keyword
arr[I, LEN] // may interfere with existing code that uses the comma operator
arr[I : LEN] // may be confusing as a start and end rather than start and length
Then we may implement testN as something like
void testN(char (&arr)[128]) {
test16(arr[0 .. 16]); // ok now
test16(arr[..]); // ok as long as array is big enough
test16(arr[10 ..]); // starts at index 10 takes an implicit amount of length
// 128-10 would allow converting arrays to any size up to 118
test32(arr[.. 32]); // no longer exact since we changed this function signature
test256(arr[..]); // errors, 256>128
}
The next problem is using a variable as an index. Since a lot of code would break, this should be opt-in.
I have implemented the below in a compiler, so I understand the complexity of each, but the compiler wasn't a C++ compiler.
// Extremely simple
void myfunc(char (&arr)[256]) {
// loop variable, positive and less than array size
for(size_t i=0; i<256; i++) {
doSomething(arr[i]);
}
}
// Fairly simple
void myfunc(char (&arr)[256], int n) {
if (n >= 0 && n < 128) { // Note 128 is smaller than array length
doSomething(arr[n]);
}
}
// Nearly as simple
void myfunc(char (&arr)[256], int n) {
if (n < 0 || n >= 128)
return;
// At this point n must be a positive, and must be smaller than the array length
doSomething(arr[n]);
}
// Medium to hard, I'm not sure if this is common enough to support this
// The issue here is n is mutated after the check
void myfunc(char (&arr)[256], int n) {
if (n < 0 || n >= 254)
return;
doSomething(arr[++n]);
doSomething(arr[++n]);
}
Having the above makes handling binary simpler.
Let's highlight the difference by using span. There will be assert(s), but no compile-time error.
#include <span>
int test1(std::span<int> file, int n) {
if (file[0] != 0x12 && file[2] != 0x34)
return -1;
return file[n] + file[100];
}
int test2(std::span<int, 32> file, int n) {
if (file[0] != 0x12 && file[2] != 0x34)
return -1;
return file[n] + file[100];
}
int main() {
int tooSmall[]={1};
int fileBadHeader[64]={1,2,3,4,5,6,7,8,9};
int fileOkHeader[64]={0x12,0x34,3,4,5,6,7,8,9};
//test1({tooSmall}, 100); // will assert
test1({fileBadHeader}, 100); // will not assert
//test1({fileOkHeader}, 100); // will assert
test2(std::span{fileBadHeader}.subspan<10, 32>(), 100); // will not assert
//test2(std::span{fileOkHeader}.subspan<0, 32>(), 100); // will assert
}
By using arrays, we can catch indexing 100 and the problematic index n if the user opts into the additional checks.
With additional support for parameters, we would be able to catch the error below.
void myFunc(std::vector<char>&v) {
#define LIB_BUF_LEN 16
auto buf = v.make_push(LIB_BUF_LEN);
// image make_push signature being: T(&)[count] make_push(constexpr size_t count);
for(int i=0; i<32; i++) { // LIB_BUF_LEN is smaller, this the array reference is smaller
buf[i] = i; // oops, out of range found at compile time
}
}
Received on 2025-08-22 20:04:46