ISOCPP std-proposals List: Re: [std-proposals] P2815 breaks all provenance-related optimizations

From: connor horman <chorman64_at_[hidden]>
Date: Wed, 22 Feb 2023 08:25:04 -0500

On Wed, 22 Feb 2023 at 07:57, Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
wrote:

> On Tue, Feb 21, 2023 at 9:53 PM connor horman via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
>> I just learned of P2815, and I do not know of another place I can respond
>> to it, so I'm going to answer here.
>>
>
> The presentation's title slide (and unfortunately nowhere else — I missed
> it the first time too!) refers to P2188:
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2188r1.html
> which, being a WG21 paper, contains contact info for the author. I've
> cc'ed him here.
>
> Anthony: In the mailing-list archive, this thread begins at
> https://lists.isocpp.org/std-proposals/2023/02/5848.php .
>
> In my opinion, the proposal in question eliminates all provenance based
>> alias analysis, used to eliminate memory accesses when unanalyzed code is
>> present.
>>
>> Consider the following example code:
>>
>> int foo(int* p){
>> int i;
>> opaque_operation(&i,p);
>> i = 0;
>> *p = 1;
>> return i;
>> }
>>
>> (Where the optimizer is unable to analyze `opaque_operation`
>> In current compilers, this can be optimized to [...
>>
>
> int foo(int *p) {
> int i; opaque_operation(&i, p);
> *p = 1; return 0;
> }
>
> ...]. Under the proposal, however, `opaque_operation` could observe that
>> `&i` and `p` have the same *value representation*, which means that they
>> must be equal (it may do so with defined behaviour in all executions, for
>> example, by terminating if they are bytewise unequal). Thus the write to
>> `p` may modify the value of `i`. This would make the transformation [...]
>> invalid under the proposal.
>>
>
> I see your point. It would be like: You go to a magic show. The magician
> gives one deck of cards to an audience member, who randomly selects a card
> and signs its face. He gives a second deck of cards to a second audience
> member, who randomly selects another card and signs its face. The magician
> takes both cards behind a partition, then re-emerges and triumphantly shows
> the audience that both audience members have somehow chosen the same card!
> "I know how that trick is done," you whisper. "Whenever the two audience
> members choose different cards, the magician goes behind the partition and
> kills himself."
>
> As in magic, C++ has some common-sense rules about what counts as a
> reasonable explanation for a trick and what doesn't.
> For example, all three major compilers will optimize
> int bar(int k) {
> while (k) { }
> return 42;
> }
> to
> int bar(int k) {
> return 42;
> }
> because C++ considers the explanation "Whenever k==0, the bottom of the
> loop is never reached" to be unreasonable.
> Now, personally, I actually disagree with C++'s criterion there — I think
> "sometimes the loop is infinite" seems like a pretty reasonable
> "explanation of the trick" as far as I'm concerned. But the example does
> show that C++ *already has mechanisms* by which it can say "Such-and-such
> an explanation of the trick is unreasonable; we don't have to entertain the
> possibility that it could ever really happen that way."
>
> So I don't think you're correct to worry that P2188 / P2815 would
> completely destroy *all* provenance-based optimizations. It needs to
> propose some wording before we can say anything concrete about it, of
> course. And it might end up destroying some optimizations. But it needn't
> destroy all of them. And at first glance it doesn't seem out-of-keeping
> with the direction of C++.
> (Likewise, the existence of `reinterpret_cast` doesn't destroy all
> type-based alias analysis. And `const_cast` famously destroys some-
> *but-not-all* const-based optimizations.)
>
Well, yes indeed, I'd need to see the full wording to make a proper
assessment. When I wrote the email, it was based on information in the
presentation slides in P2815, because I wasn't sure what the title P2188
referred to (interesting format for sure, a "paper" that is just a
presentation on another paper). However, I'm skeptical that the actual
paper can do what it purports to want to do w/o at least severely limiting
those optimizations.

For background, I've done and commented on work on the Rust Programming
Language's Operational Semantics Model relating to pointer provenance
(which wants many similar optimizations to C++, and offers even more in
some cases), and through that I've learned to be very careful of proposals
with goals like "If two pointers have the same *value representation* they
are identical". It may not be intended to break too many optimizations,
but I've found that goals like this are typically incompatible with what
compilers assume and need to assume in order for those optimizations to be
valid. Rust solved the "pointers are trivially copyable but carry
out-of-band data" problem by saying that bytes in memory *do* carry
provenance data, it's just not easy (read, not possible) to observe w/o
causing undefined behaviour. This was a good solution to the joint problem
of "memcpy preserves pointers" and "pointers aren't just a number", though
carrying its own problems, and solving those problems through what seem
like natural extensions of the same solution run into the same kind of
optimization-destroying effects. I think C++ adopting anything like
"bytewise equivalence means can do the same things" w/o a solution like the
provenance-in-byte-values model like Rust has is liable to hinder or enjoin
most (if not all) provenance related optimizations, possibly w/o even
realising it.
In the current world, touching anything relating to pointers is a lot like
walking on a minefield - only the explosion is aimed at our generated code
efficiency, rather than our feet.

(Citation on the abstract byte model, note that it also encodes
uninitialized data/indeterminate values in the "Byte"):
https://github.com/RalfJung/minirust/blob/master/mem/interface.md#abstract-bytes
)

> –Arthur
>

Received on 2023-02-22 13:25:17