std-discussion: Re: inout_ptr - inconsistent release()

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Tue, 3 Aug 2021 14:02:06 -0400

(Woops, a follow-up because I didn't translate the example code
correctly for (1) and (2). Here it is, better:

// Implementation (1)
void MyFunc(BYTE** p_bytes) {
      delete(*p_bytes);
     BYTE* new_ptr = new BYTE[42]();
      if (new_ptr) {
         *p_bytes = new_ptr;
     }
     else {
          *p_bytes = NULL;
     }
}

// Implementation (2)
void MyFunc(BYTE** p_bytes) {
     BYTE* new_ptr = new BYTE[42]();
     if (new_ptr) {
          delete(*p_bytes);
         *p_bytes = new_ptr;
     }
     else {
          // literally nothing
     }
}

Also, note that an implementation of (1) can have better exception
safety, if you write it AFTER the coffee kicks in and not before:

// Implementation (1a)
void MyFunc(BYTE** p_bytes) {
      delete(*p_bytes);
      *p_bytes = NULL;
      BYTE* new_ptr = new BYTE[42]();
      if (new_ptr) {
         *p_bytes = new_ptr;
     }
}

So there's still a lot to fix here...)

On Tue, Aug 3, 2021 at 1:55 PM JeanHeyd Meneide <phdofthehouse_at_[hidden]> wrote:
>
> There's a lot of waxing poetical about C API designs below, skip to
> the "-------------------------------" if you want to get to what a
> more effective change would be.
>
> Reading this over, I think at the core there's effectively 2 different
> styles of C API that take in-and-(maybe-)out pointers, where the
> "release always" approach is either good or bad in a mutually
> exclusive way. Let's elaborate using 2 implementations of MyFunc, one
> that shows the behavior you have here and the other that is the kind I
> (one of the proposal authors for out_ptr/inout_ptr) encountered more
> often (always?) in the wild:
>
> // Implementation (1)
> void MyFunc(BYTE** p_bytes) {
> free(*ptr);
> BYTE* new_ptr = new BYTE[42]();
> if (new_ptr) {
> *p_bytes = new_ptr;
> }
> else {
> *p_bytes = new_ptr;
> }
> }
>
> // Implementation (2)
> void MyFunc(BYTE** p_bytes) {
> BYTE* new_ptr = new BYTE[42]();
> if (new_ptr) {
> delete(*ptr);
> *p_bytes = new_ptr;
> }
> else {
> // literally nothing
> }
> }
>
> Let us consider always calling .release(), unconditionally, in the
> constructor or the destructor of an inout_ptr_t. Here are the 2 things
> that happen:
>
> — For (1), it works as outlined in your post. There is no double-free,
> because the pointer is unconditionally deleted by the C function. This
> is good and solves the presented problem.
>
> — For (2), it works to the opposite. This C API does not do anything
> with the input if it cannot guarantee that it can replace the value.
> Calling .release() unconditionally here results in the opposite bug of
> a dangling resource.
>
> What complicates matters more is that both approaches are very valid
> to designing a C API. For example, (1) is a good implementation for
> certain scenarios and for resource-constrained environments. Deleting
> the resource always means that, in a resource-constrained environment,
> you have a greater chance that the next re-creation of the resource is
> more successful. This does, unfortunately, require making sure that
> there is some sentinel value / error value communicated to the user
> about this decision (e.g., setting the pointer to NULL).
>
> (2) is a good implementation for systems meant to be more "inert" or
> "atomic" in their changes. It guarantees that nothing happens and
> everything is rolled back if it cannot guarantee that the replacement
> happens. Either the in-out portion of the operation happens, or it
> happens not at all. (This also matches the ideals behind exception
> safety: for example, if "new" throws in either (1) or (2), the (2)
> implementation is safer. In a C Environment, that's never supposed to
> happen, but there are many implementations which
> wink-wink-hint-hint-nudge-nudge and let you throw an exception or two
> through C API calls anyways.)
>
> -------------------------------
> -------------------------------
>
> Given that making these changes effectively throws one design under
> the bus, I think this might be evidence that one of a few things might
> need to happen (and I'd be willing to do the legwork to make sure this
> happens, since it was my effort to put it in the standard after all!)
>
> A. Add additional tracking to std::inout_ptr. This means holding onto
> the "old" value and, instead of comparing against "nullptr", comparing
> against the old value and letting that be the trigger for calling
> .reset() (or equivalent). This covers (should cover?) the
> always-delete case AND the delete-and-reset-or-do-nothing case. This
> comes at the cost of additional performance, which I am none too sure
> some of the original instigators of this proposal would be happy about
> since one of the primary reasons was putting it in the standard so the
> standard pointers had a chance to "directly reseat the pointer". I
> need to think more about whether the direct re-seating optimization is
> unaffected by this, and run some more tests after the coffee hits.
>
> B. Create a third type that carefully captures the semantic difference
> between "I unconditionally delete my pointers" and "I only
> conditionally delete if the operation I am doing is a success". I'd
> note the nomenclature for this might actually already be in place:
> out_ptr only handles cases where a pointer is coming out. inout_ptr
> only handles the cases where a pointer goes in AND comes out changed.
> in_ptr would handle a pointer that goes in and is unconditionally
> deleted, and only MAYBE comes out the other side. (I think people
> would object to this kind of framing and the name in_ptr, but it is a
> start. Perhaps the name could be in_maybe_out_ptr, or something
> equally verbose). This approach retains performance no matter what,
> but puts the burden of knowing the "kind" of C API on the end-user.
>
> C. (This one needs to happen regardless.) I need to try to refine the
> wording around how .release() is called in the constructor. The ideal
> was to prevent the user from observing that .release() is called until
> the destructor happens, effectively leaving the state of the original
> smart pointer in flux until after the destructor.
>
> I'm not sure which path of (A) or (B) is best. Given the example above
> the marquee lines, I think that there are 2 different semantics here
> and I am unsure of how mixing both kinds in inout_ptr may impact
> speed. (It ultimately sounds like a better user experience, because
> someone having to know what very well may be a secret internal
> implementation detail to differentiate between one of 3 different
> out_ptr / inout_ptr / in{something}_ptr is... tough.)
>
> In either case, thank you for catching this, Doug!! I've got some
> implementation and options to start benchmarking and testing, and some
> people to e-mail! I'll make sure this gets fixed before ship.
>
> Sincerely,
> JeanHeyd

Received on 2021-08-03 13:02:24