C++ Logo

std-proposals

Advanced search

Re: [std-proposals] C++23.Standards.Committee.Propasal.For.Validated.Types.h

From: Federico Kircheis <federico_at_[hidden]>
Date: Thu, 6 Jul 2023 21:13:48 +0200
On 06/07/2023 19.25, Julien Villemure-Fréchette via Std-Proposals wrote:
> Validation of an object's value is already provided in the c++ language
> through the constructors declared in a class. This is the fundamental
> intent of why constructors exist as a language feature: if an object's
> lifetime started, then it must be in a valid state; and if no valid
> object can be construed from the provided arguments, then either throw
> an exception or explicitly document that such combination of arguments
> implies undefined behavior. The last approach is commonly seen in the
> standard library, and will be made easier to express directly in code
> when Contracts will be available.
>
> For your current use case, if you want to add additional validation to
> an object of a specific type, the you should use composition from the
> underlying type: define a new class type that either contains or derive
> privately from the the type need to validate, and then define a public
> constructor that handles all necessary validation. Ideally, this
> constructor should handle the general case, so that any other public
> constructor can use it as a delegate, so the main validation code is
> kept localized in a single place. Private non validating constructors
> can also be provided if they would be useful for the class's
> implementation, provided that the validity of the object they would
> create can be inferred in some way.
>
> Julien V.
>

I agree with your point, but creating new types can be problematic,
especially when working with third-party libraries/some external API.

To make my example more concrete, suppose there is a library that deals
with string that have certain invariant.

For example it handles only strings that are encoded in utf-8.
And, for simplicity, suppose there is no string_view.

The API looks like

// works correctly only with utf-8 string
namespace lib{
void foo(const std::string);
// takes any string, throw if invariant does not hold
void validate(const string& str);
}

In the code of the client of the API, strings come from multiple
resources, not all of them are necessarily encoded in UTF-8

A possible convention is to document (with comments), which functions
accepts strings to be utf-8 encoded.

----
namespace app{
// takes any string
void foo(const string& str);
}
void app::foo(const string& str){
   lib::validate(str);
   lib::bar(str);
}
----
Another possibility is to use a (hungarian) naming convention.
Strings that have not been verified yet are save in variable prefixed 
with u_ (unchecked), string that have been verified with v_ 
(validated/verified)
----
namespace app{
// takes any string, throw if invariant does not hold
void validate(const string& u_str);
// takes any string
void foo(const string& u_str);
}
void app::foo(const string& u_str){
   lib::validate(u_str);
   auto& v_str = u_str;
   lib::bar(v_str);
}
----
A third possibility, which is more or less the summary of what you 
wrote, is to use different string types
----
namespace app{
struct u_string{
// ...
};
struct v_string{
// constructors of v_string calls lib::validate
// ...
};
void validate(const string& u_str);
// takes any string
void foo(const string& u_str);
}
void foo(const u_string& u_str){
   lib::bar(v_string(u_str));
}
----
But lib::bar(v_string(u_str)); does not work, one needs to provide 
access the internal string.
Of course this would not be the case if the library provides a special 
string class(!), but it has the downside that you need to convert your 
string types between different classes (the "the you should use 
composition from the underlying type" in your messsage) and often this 
is undesired.
One of the reason is that converting between classes is generally costly.
The string would get copied, every time you need to pass it from one 
library to another, and for other classes, also moving might be costly.
string_view is not a solution, as not for every class it is possible to 
write a corresponding view type.
Here some considerations about the different approaches
The first two approaches are error-prone, as there is little to no 
tooling to help to detect errors, but have no overhead.
The third approach is less error-prone.
Unless the constructor of v_string does not verify the invariant, it is 
assured that bar is only called with strings where the invariant holds.
This I would really like to have an API with a specialized class, but(!) 
without the overhead of copying the content (for string and any other 
class).
----
#include <string>
using std::string;
namespace lib{
struct v_string : string {};
// takes any string, throw if invariant does not hold
// using pointers for simplicity
const v_string& validate(const string& str){
  // do something meaningful
  return *reinterpret_cast<const v_string*>(&str); // currently UB
}
void validate(string&& str) = delete;
// only validated strings
void bar(const v_string& str);
}
namespace app{
// takes any string
void foo(const string& str);
}
namespace lib2{
// lib from another class that takes any string
void baz (const string& str);
}
void app::foo(const string& str){
   //bar(str); // does not compile, good
   lib::bar(lib::validate(str)); // compiles, but UB
   auto s2 = lib::validate(str);
   bar(s2);
   lib2::baz(s2); // can be used as const string& too!
}
----
Compared to the previous solutions
1)
There is tooling to avoid common error (the compiler by taking advantage 
of the type system).
You need to call lib::validate to validate you string, or reinterpret_cast.
In the second case, as reinterpret_cast should be rare(!), a simple grep 
or disabling warnings locally would do most of the job.
2)
It is a 0-cost abstraction as in we do not need to copy the content of 
the class, except for the validation, it is "just a cast".
Again: for more complex classes it is not possible to create a 
corresponding view type.
You correctly claim that we can write a v_string that validates in it's 
constructor, but in this case we are forced to copy or move the content 
from the original string, while we do not need to do it if we use the 
same type for all the API.
Yes, I know this is UB, but I would like it very much if somehow we can 
express that a class is an alias for another, maybe something like
`struct v_string : alias string {};`
and make the cast valid C++ code, as we do not need to start a new lifetime.

Received on 2023-07-06 19:13:56