Date: Fri, 26 May 2023 21:30:05 +0200
Hello to everyone
I know that it is not possible to create objects out of thins air:
int i = = 42;
char buffer[sizeof(int)];
std::memcpy(buffer, &i, sizeof(int));
int* j = reinterpret_cast<int*>(buffer);
*j; // UB, even if *j == i, as j does not point to an int object
and normally this is not a big issue.
But lately I was toying with the following idea
Instead of relying on conventions for denoting the state of a variable,
I wanted to use a different type.
For example, suppose you library deals with strings, and most internal
routines expects those string to be encoded in utf-8, or have another
invariant.
A possible convention is to document (with comments), which functions
accepts strings to be utf-8 encoded.
// takes any string, throw if invariant does not hold
void validate(const string& str);
// takes any string
void foo(const string& str);
// only validated strings
void bar(const string& str);
void foo(const string& str){
validate(str);
bar(str);
}
Another possibility is to use a (hungarian) naming convention.
Strings that have not been verified yet are save in variable prefixed
with u_ (unchecked), string that have been verified with v_ (validated)
// takes any string, throw if invariant does not hold
void validate(const string& u_str);
// takes any string
void foo(const string& u_str);
// only validated strings
void bar(const string& v_str);
void foo(const string& u_str){
validate(u_str);
auto& v_str = u_str;
bar(v_str);
}
A third possibility is to use different string types
struct u_string{
// ...
};
struct v_string{
// constructors of v_string validates invariant
// ...
};
void validate(const string& u_str);
// takes any string
void foo(const string& u_str);
// only validated strings
void bar(const string& v_str);
void foo(const u_string& u_str){
bar(v_string(u_str));
}
The first two approaches are error-prone, as there is little to no
tooling to help to detect errors.
The third approach is less error-prone. Unless the constructor of
v_string does not verify the invariant, it is assured that bar is only
called with strings where the invariant holds.
Unfortunately it has an overhead: string are unnecessarily copied or moved.
This overhead might or might not be negligible.
string_view helps to avoid having a measurable overhead, but for more
complex datatypes, it is not always feasible to write a corresponding
view type.
What about, for example, a class that contains a vector<string>?
Which is why I was thinking about a fourth possibility, which is UB, as
it creates objects from nowhere.
Here is a minimal example/proof of concept
struct v_string : string {};
void validate(const string& str);
#define VALIDATE(x) \
(validate(x), *reinterpret_cast<const v_string*>(&x))
// takes any string
void foo(const string& str);
// only validated strings
void bar(const v_string& str);
// also accepts not validated strings
void bar2(const string& str);
void foo(const string& str){
bar(str); // does not compile
bar(VALIDATE(str)); // compiles, but UB
auto s2 = VALIDATE(str);
bar(s2);
bar2(s2); // can be used as const string& too
}
with no overhead I mean that there are no moves or copies.
But
* I am giving to the function bar an object that does not really exist
* I used a macro, as a function would have created a new scope, and
returning from it might have created a copy/move
* slicing issues, even if the subclass does not add any member
variable/method
I guess that I am out of luck and there is no way to overcome the first
issue, but I wanted to be sure.
Or maybe there is a fifth way to achieve what I had in mind?
Best
Federico
I know that it is not possible to create objects out of thins air:
int i = = 42;
char buffer[sizeof(int)];
std::memcpy(buffer, &i, sizeof(int));
int* j = reinterpret_cast<int*>(buffer);
*j; // UB, even if *j == i, as j does not point to an int object
and normally this is not a big issue.
But lately I was toying with the following idea
Instead of relying on conventions for denoting the state of a variable,
I wanted to use a different type.
For example, suppose you library deals with strings, and most internal
routines expects those string to be encoded in utf-8, or have another
invariant.
A possible convention is to document (with comments), which functions
accepts strings to be utf-8 encoded.
// takes any string, throw if invariant does not hold
void validate(const string& str);
// takes any string
void foo(const string& str);
// only validated strings
void bar(const string& str);
void foo(const string& str){
validate(str);
bar(str);
}
Another possibility is to use a (hungarian) naming convention.
Strings that have not been verified yet are save in variable prefixed
with u_ (unchecked), string that have been verified with v_ (validated)
// takes any string, throw if invariant does not hold
void validate(const string& u_str);
// takes any string
void foo(const string& u_str);
// only validated strings
void bar(const string& v_str);
void foo(const string& u_str){
validate(u_str);
auto& v_str = u_str;
bar(v_str);
}
A third possibility is to use different string types
struct u_string{
// ...
};
struct v_string{
// constructors of v_string validates invariant
// ...
};
void validate(const string& u_str);
// takes any string
void foo(const string& u_str);
// only validated strings
void bar(const string& v_str);
void foo(const u_string& u_str){
bar(v_string(u_str));
}
The first two approaches are error-prone, as there is little to no
tooling to help to detect errors.
The third approach is less error-prone. Unless the constructor of
v_string does not verify the invariant, it is assured that bar is only
called with strings where the invariant holds.
Unfortunately it has an overhead: string are unnecessarily copied or moved.
This overhead might or might not be negligible.
string_view helps to avoid having a measurable overhead, but for more
complex datatypes, it is not always feasible to write a corresponding
view type.
What about, for example, a class that contains a vector<string>?
Which is why I was thinking about a fourth possibility, which is UB, as
it creates objects from nowhere.
Here is a minimal example/proof of concept
struct v_string : string {};
void validate(const string& str);
#define VALIDATE(x) \
(validate(x), *reinterpret_cast<const v_string*>(&x))
// takes any string
void foo(const string& str);
// only validated strings
void bar(const v_string& str);
// also accepts not validated strings
void bar2(const string& str);
void foo(const string& str){
bar(str); // does not compile
bar(VALIDATE(str)); // compiles, but UB
auto s2 = VALIDATE(str);
bar(s2);
bar2(s2); // can be used as const string& too
}
with no overhead I mean that there are no moves or copies.
But
* I am giving to the function bar an object that does not really exist
* I used a macro, as a function would have created a new scope, and
returning from it might have created a copy/move
* slicing issues, even if the subclass does not add any member
variable/method
I guess that I am out of luck and there is no way to overcome the first
issue, but I wanted to be sure.
Or maybe there is a fifth way to achieve what I had in mind?
Best
Federico
Received on 2023-05-26 19:30:14