C++ Logo

std-discussion

Advanced search

objects with no lifetime for validation

From: Federico Kircheis <federico_at_[hidden]>
Date: Fri, 26 May 2023 21:30:05 +0200
Hello to everyone

I know that it is not possible to create objects out of thins air:

int i = = 42;
char buffer[sizeof(int)];
std::memcpy(buffer, &i, sizeof(int));

int* j = reinterpret_cast<int*>(buffer);
*j; // UB, even if *j == i, as j does not point to an int object

and normally this is not a big issue.



But lately I was toying with the following idea

Instead of relying on conventions for denoting the state of a variable,
I wanted to use a different type.

For example, suppose you library deals with strings, and most internal
routines expects those string to be encoded in utf-8, or have another
invariant.

A possible convention is to document (with comments), which functions
accepts strings to be utf-8 encoded.

// takes any string, throw if invariant does not hold
void validate(const string& str);
// takes any string
void foo(const string& str);
// only validated strings
void bar(const string& str);

void foo(const string& str){
   validate(str);
   bar(str);
}


Another possibility is to use a (hungarian) naming convention.
Strings that have not been verified yet are save in variable prefixed
with u_ (unchecked), string that have been verified with v_ (validated)

// takes any string, throw if invariant does not hold
void validate(const string& u_str);
// takes any string
void foo(const string& u_str);
// only validated strings
void bar(const string& v_str);

void foo(const string& u_str){
   validate(u_str);
   auto& v_str = u_str;
   bar(v_str);
}


A third possibility is to use different string types

struct u_string{
// ...
};

struct v_string{
// constructors of v_string validates invariant
// ...
};

void validate(const string& u_str);
// takes any string
void foo(const string& u_str);
// only validated strings
void bar(const string& v_str);

void foo(const u_string& u_str){
   bar(v_string(u_str));
}


The first two approaches are error-prone, as there is little to no
tooling to help to detect errors.
The third approach is less error-prone. Unless the constructor of
v_string does not verify the invariant, it is assured that bar is only
called with strings where the invariant holds.
Unfortunately it has an overhead: string are unnecessarily copied or moved.
This overhead might or might not be negligible.
string_view helps to avoid having a measurable overhead, but for more
complex datatypes, it is not always feasible to write a corresponding
view type.
What about, for example, a class that contains a vector<string>?


Which is why I was thinking about a fourth possibility, which is UB, as
it creates objects from nowhere.

Here is a minimal example/proof of concept

struct v_string : string {};

void validate(const string& str);
#define VALIDATE(x) \
  (validate(x), *reinterpret_cast<const v_string*>(&x))

// takes any string
void foo(const string& str);
// only validated strings
void bar(const v_string& str);
// also accepts not validated strings
void bar2(const string& str);

void foo(const string& str){
   bar(str); // does not compile
   bar(VALIDATE(str)); // compiles, but UB

   auto s2 = VALIDATE(str);
   bar(s2);
   bar2(s2); // can be used as const string& too
}

with no overhead I mean that there are no moves or copies.

But

  * I am giving to the function bar an object that does not really exist
  * I used a macro, as a function would have created a new scope, and
returning from it might have created a copy/move
  * slicing issues, even if the subclass does not add any member
variable/method

I guess that I am out of luck and there is no way to overcome the first
issue, but I wanted to be sure.

Or maybe there is a fifth way to achieve what I had in mind?

Best

Federico

Received on 2023-05-26 19:30:14