std-proposals: Re: Poisoned initializers

From: Tom Honermann <tom_at_[hidden]>
Date: Sat, 12 Jun 2021 08:45:42 -0400

On 6/12/21 6:03 AM, Ville Voutilainen via Std-Proposals wrote:
> On Sat, 12 Jun 2021 at 08:04, Tom Honermann via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
>> Accessing uninitialized variables or data members is a well known source of undefined behavior (UB). The fact that such accesses result in UB is leveraged by tools like MemorySanitizer to diagnose such accesses. Programmers that use these tools may resist adding initializers when a default value is not desirable since doing so prevents diagnosing an unintended access before an appropriate value has been determined and assigned. This results in undesirable compiler warnings or static analysis complaints about uninitialized variables or data members (undesirable because the omission is intentional). Worse, if such an access is not diagnosed and corrected, the program will exhibit UB at run-time.
>>
>> This situation could perhaps be improved by allowing an initializer to be "poisoned". The idea is that, when code is compiled for use with a tool like MemorySanitizer, the initializer is effectively discarded or the assigned object value otherwise marked as invalid so as to enable diagnosing an invalid access. But when compiled normally, the provided initializer is used so as to avoid UB and ensure consistent behavior at run-time (though that behavior may still be wrong according to programmer intentions).
>>
>> For example:
>>
>> struct S {
>> S() {}
>> S(int v) : dm(v) {}
>>
>> // dm may only be used if a value was provided during construction.
>> int dm = POISON(-1);
>> };
>>
>> int f(bool b) {
>> S s = b ? S() : S(42);
>> // The following access of s.dm violates preconditions if b is true.
>> // Would like MemorySanitizer to diagnose as an access of an uninitialized
>> // data member. But if MemorySanitizer is not enabled, then would like to
>> // reliably return -1.
>> return s.dm;
>> }
>>
>> Implementation would presumably require POISON() to be implemented via a keyword or intrinsic function. Perhaps there is a clever way to implement it as a library function.
>>
>> Worth pursuing? Prior art?
> There has been multiple previous discussions about explicit marking a
> variable uninitialized, but I think this is a new twist
> on the idea. I wonder, considering that the initializer value is used
> unless compilation for a tool decides otherwise,
> why this isn't just
>
> int dm [[poisoned]] = -1;
> or
> [[poisoned]] int dm = -1;
> if that position is your cup of tea.
>
> The implementation would recognize the attribute and mark the variable
> for sanitizing or discard the initializer if built for a sanitizer,
> and otherwise just use the initializer.

Ah, yes, I agree that an attribute is the right tool for this. However,
the goal is to enable taint analysis by annotating a value, not a
variable (a variable can be cured, a poisoned value is forever
tainted). This may require a form of expression attributes.

It may be useful to annotate a portion of an initializer:

    struct aggregate {
       const char *desc;
       int value;
    };
    aggregate a = { "the thing", POISON(-1) };

Likewise, applying poison outside of initialization context could be useful:

    int *p = new int;
    ...
    delete p;
    p = POISON(nullptr);

Tom.

Received on 2021-06-12 07:45:47