sg14: Re: [SG14] [isocpp-parallel] (SC22WG14.16350) Rough notes from SC22 WG21 SG14 meeting on pointer lifetime-end zap

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 2 May 2019 08:50:54 +0200

So, we seem to be in violent agreement that the current rule of
invalidating a pointer value system-wide once the heap storage
it points to is freed is not a good rule: People don't actually
observe it when programming, and it makes certain algorithms near-
impossible to implement (uintptr_t is not available unconditionally).

What's the next step?

In my view, we need proposed wording changes to be able to gauge
comfort with the actual fix.

Jens

On 02/05/2019 00.18, Hans Boehm wrote:
> I like the concurrent algorithms discussion. But I think the discussion glossed over the sequential programming issues, which are also important.
>
> I'll happily concede that anyone who has been following WG14 from the beginning will probably have avoided comparing dangling pointers, at least if they realized that's what they were doing. In my experience as a non-WG14 until fairly recently who has dealt with a variety of code bases not written by WG14 participants, I think the issues here have never been well-understood, at least when it comes to malloc()/free() managed memory. (Like most participants(?), I really don't care about out-of-scope locals; the current rules are OK there.)
>
> The core problem is that this rule is directly at odds with most people's informal operation notion of what a pointer is: Essentially an index into memory. Doing something to the memory, like deallocating it, doesn't change the value of an index. I conjecture it never occurs to most people that such a comparison might not work. Even if they abstractly recognize that they should not be manipulating pointers to dead objects, I'm suspicious that they might not realize when they're actually doing it.
>
> This is compounded by the fact that if users take control of their own memory management, as the Linux kernel clearly does, the impact of the C rule effectively goes away. Thus the rule is fundamentally unexplainable by having people implement their own malloc()/free(), as they easily can within the standard.
>
> I do think it's somewhat rare for code to benefit from manipulating dangling pointers. But I've seen multiple instances of it in widely circulated and widely used code bases. I've never heard anyone expressing qualms about doing so. From my perspective, after having been pointed at the appropriate wording, the standard is moderately clear that e.g. comparison of dangling pointers or copying them results in undefined behavior, though I don't think the reasoning via indeterminate values and trap values is trivial to deduce.
>
> Question 8 in Peter et al's survey (https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-survey-discussion.html) says that roughly 66% of an "expert audience" gave what we believe to be the wrong answer, with only 9% giving the correct answer. I think that misunderstanding, together with the concurrency use cases, are the main reasons this discussion didn't happen decades ago.
>
> Perhaps the most widely examined code that I believe copies dangling pointers is libc++'s weakptr implementation. There has been much discussion of its internals, but I couldn't find any of that aspect. I suspect many wg21 members have looked at this code, or similar code in other C++ libraries, without complaining.
>
> Android's wp<> weak pointer implementation (https://android.googlesource.com/platform/system/core/+/master/libutils/include/utils/RefBase.h) does even more of this, using a form of dangling pointer comparison as well. I don't think it's as visible as libc++, but it's also open source.
>
> Both are crucial parts of software bases that are used daily by billions of people. They're C++ code, but I think the issues are the same, and I don't think we want a fundamental disagreement between the languages here. They're reasonable solutions that could also be applied to C code.
>
> Probably the easiest to explain use cases are those in which the pointers originally point to e.g. identifiers stored in some memory region. The identifiers are initially kept in a hash table, so that each identifier is stored exactly once. We compare identifiers by comparing pointers. Past a point, only the identity of those identifiers matters; we no longer care about the names, and no longer add new identifiers. But it still matters whether two identifiers are the same. Thus we can deallocate the memory region containing the identifier strings and the hash table, saving space. The current rule prevents us from deallocating that memory in spite of the fact that we are no longer accessing it. This was essentially the SGI compiler example I saw in the late 90s.
>
> I personally don't see a sufficient benefit from retaining the current rule. It doesn't look to me like it's worked out. The discussions of ancient architectures were very enlightening (I used the CDC 6600 architecture, too! I can confirm that Paul is right :-) ), but the real use cases all seem quite speculative, even after all these years. I'm personally not convinced by the "possible future architectures" argument unless it really doesn't matter now, one way or another. This does matter now. We have previously constrained architectures (e.g. we basically require byte store instructions in multicore systems) where it was important. This is important; having compilers aggressively take advantage of this rule would break significant code, both sequential and concurrent.
>
> We had an interesting internal discussion about using the current rule to detect use-after-free. Clearly that's been done, but hasn't generally caught on. I think it's not fundamentally hopeless. But the core problem is that to catch realistic bugs with sufficient probability to be interesting, you need to "poison" pointers to the object elsewhere in the code. The pointer that was actually passed to free() is often an argument to a wrapper function; "poisoning" that local variable isn't interesting. Even if that's not the case, an immediate access through a deallocated pointer could and would probably be caught in other ways. Effectively doing this "remote" "poisoning" via static or dynamic analysis still seems to be a research problem.
>
> At an absolute minimum, I think we need a well-advertised "pointer alternative" that isn't subject to the current rule, so that there is a reasonable way to rewrite existing code, like the above examples, to conform. If it's uintptr_t, then I think we need to clearly agree on that. (And wg21 needs to introduce a type-safe wrapper to hide it.) But I wouldn't be surprised if that's worse for compiler alias analysis than dropping the current rule, especially if it's only dropped for malloc()/free(). I clearly don't think atomic pointers suffice.
>
> Hans
>
> _______________________________________________
> Parallel mailing list
> Parallel_at_[hidden]
> Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
> Link to this post: http://lists.isocpp.org/parallel/2019/05/2601.php
>

Received on 2019-05-02 01:53:00