Date: Sat, 6 Nov 2021 17:53:47 +0000
Hi,
I’ve been thinking about the idea of explicit types and type declaration for indices for data structures. The use of `int` types to index into data structures allows the use of one variable to multiple different data structures where the semantic meaning of the axis being indexed into might be different. A simple demonstration for this is a 2D grid with height and width dimensions.
double sum_entries(std::vector<std::vector<double> > &my_array) {
double sum = 0;
for (int i = 0; I < my_array.size(); ++i) {
for (int j = 0; j < my_array[i].size(); ++j) {
// BUG: swap uses of j & i indices
sum += my_array[j][i];
}
}
return sum;
}
>From just reading the signature of the function, it’s unclear what the dimensions of `my_array` refer to. For more complex codebases, this can make debugging code that you’re unfamiliar with extremely difficult. Asides from debugging troubles, the obvious problem here is that we can use `i`, and index meant for the first dimension, on the second dimension without any compiler complaint. I’m aware that many structures offer ways to obviate the use of indexing all together (ie for(auto it = vec.begin(); it != vec.end(); ++it), but those idioms can’t always be used necessarily and also don’t solve the problem of code where the axis of a container might have semantic meaning which can’t be expressed as a plain int or some form of standard iterator.
My proposal is the introduction of a way to explicitly introduce types and associate them with the dimensions of various indexable data structures. I have implemented a potential solution here:
https://github.com/ChrisKjellqvist/TIMatrix
The implementation, to my understanding, should be eventually compiled down to integers (or whatever the storage type is) and therefore incurs no performance slowdown. To use these indexes, I have implemented some example matrix templates, but these indexes could be integrated into any data structure I imagine. As well, these matrices are not at all part of my proposal, but just a proof of concept. An example usage of them might look like so:
declare_free_index(idx_h)
declare_free_index(idx_w)
using my_matrix = TwoDimMat<double, idx_h, idx_w>;
double sum_entries(my_matrix &mat) {
double sum = 0;
for (idx_h i(0); i < mat.size(); ++i) {
for (idx_w j(0); j < mat[i].size(); ++j) {
sum += mat[i][j];
// where mat[j][i] would result in a compile time error
}
}
return sum;
}
One of the key benefits of this addition as well would be that code becomes more self-documenting. Integers/iterators are a convenient and performance efficient way to represent a lot of things, but the type lacks semantic information. With this, the developer would be able to label the type in the code, using whatever underlying storage type they want but creating a new type to convey semantic information.
These indexes could offer numerous other features such as:
* out-of-bounds checking (by somehow linking them to the structure they’re associated with)
* run-time checks for index types (as supposed to compile time)
I’m curious to know what others think of this.
~ Chris Kjellqvist
I’ve been thinking about the idea of explicit types and type declaration for indices for data structures. The use of `int` types to index into data structures allows the use of one variable to multiple different data structures where the semantic meaning of the axis being indexed into might be different. A simple demonstration for this is a 2D grid with height and width dimensions.
double sum_entries(std::vector<std::vector<double> > &my_array) {
double sum = 0;
for (int i = 0; I < my_array.size(); ++i) {
for (int j = 0; j < my_array[i].size(); ++j) {
// BUG: swap uses of j & i indices
sum += my_array[j][i];
}
}
return sum;
}
>From just reading the signature of the function, it’s unclear what the dimensions of `my_array` refer to. For more complex codebases, this can make debugging code that you’re unfamiliar with extremely difficult. Asides from debugging troubles, the obvious problem here is that we can use `i`, and index meant for the first dimension, on the second dimension without any compiler complaint. I’m aware that many structures offer ways to obviate the use of indexing all together (ie for(auto it = vec.begin(); it != vec.end(); ++it), but those idioms can’t always be used necessarily and also don’t solve the problem of code where the axis of a container might have semantic meaning which can’t be expressed as a plain int or some form of standard iterator.
My proposal is the introduction of a way to explicitly introduce types and associate them with the dimensions of various indexable data structures. I have implemented a potential solution here:
https://github.com/ChrisKjellqvist/TIMatrix
The implementation, to my understanding, should be eventually compiled down to integers (or whatever the storage type is) and therefore incurs no performance slowdown. To use these indexes, I have implemented some example matrix templates, but these indexes could be integrated into any data structure I imagine. As well, these matrices are not at all part of my proposal, but just a proof of concept. An example usage of them might look like so:
declare_free_index(idx_h)
declare_free_index(idx_w)
using my_matrix = TwoDimMat<double, idx_h, idx_w>;
double sum_entries(my_matrix &mat) {
double sum = 0;
for (idx_h i(0); i < mat.size(); ++i) {
for (idx_w j(0); j < mat[i].size(); ++j) {
sum += mat[i][j];
// where mat[j][i] would result in a compile time error
}
}
return sum;
}
One of the key benefits of this addition as well would be that code becomes more self-documenting. Integers/iterators are a convenient and performance efficient way to represent a lot of things, but the type lacks semantic information. With this, the developer would be able to label the type in the code, using whatever underlying storage type they want but creating a new type to convey semantic information.
These indexes could offer numerous other features such as:
* out-of-bounds checking (by somehow linking them to the structure they’re associated with)
* run-time checks for index types (as supposed to compile time)
I’m curious to know what others think of this.
~ Chris Kjellqvist
Received on 2021-11-06 12:53:50