On Wed, 22 Feb 2023 at 07:57, Arthur O'Dwyer <arthur.j.odwyer@gmail.com> wrote:

On Tue, Feb 21, 2023 at 9:53 PM connor horman via Std-Proposals <std-proposals@lists.isocpp.org> wrote:
I just learned of P2815, and I do not know of another place I can respond to it, so I'm going to answer here.

The presentation's title slide (and unfortunately nowhere else — I missed it the first time too!) refers to P2188:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2188r1.html
which, being a WG21 paper, contains contact info for the author. I've cc'ed him here.

Anthony: In the mailing-list archive, this thread begins at https://lists.isocpp.org/std-proposals/2023/02/5848.php .

In my opinion, the proposal in question eliminates all provenance based alias analysis, used to eliminate memory accesses when unanalyzed code is present.

Consider the following example code:

int foo(int* p){
int i;
opaque_operation(&i,p);
i = 0;
*p = 1;
return i;
}

(Where the optimizer is unable to analyze `opaque_operation`
In current compilers, this can be optimized to [...

int foo(int *p) {
int i; opaque_operation(&i, p);
*p = 1; return 0;
}

...]. Under the proposal, however, `opaque_operation` could observe that `&i` and `p` have the same *value representation*, which means that they must be equal (it may do so with defined behaviour in all executions, for example, by terminating if they are bytewise unequal). Thus the write to `p` may modify the value of `i`. This would make the transformation [...] invalid under the proposal.

I see your point. It would be like: You go to a magic show. The magician gives one deck of cards to an audience member, who randomly selects a card and signs its face. He gives a second deck of cards to a second audience member, who randomly selects another card and signs its face. The magician takes both cards behind a partition, then re-emerges and triumphantly shows the audience that both audience members have somehow chosen the same card!
"I know how that trick is done," you whisper. "Whenever the two audience members choose different cards, the magician goes behind the partition and kills himself."

As in magic, C++ has some common-sense rules about what counts as a reasonable explanation for a trick and what doesn't.
For example, all three major compilers will optimize
int bar(int k) {
while (k) { }
return 42;
}
to
int bar(int k) {
return 42;
}
because C++ considers the explanation "Whenever k==0, the bottom of the loop is never reached" to be unreasonable.
Now, personally, I actually disagree with C++'s criterion there — I think "sometimes the loop is infinite" seems like a pretty reasonable "explanation of the trick" as far as I'm concerned. But the example does show that C++ already has mechanisms by which it can say "Such-and-such an explanation of the trick is unreasonable; we don't have to entertain the possibility that it could ever really happen that way."

So I don't think you're correct to worry that P2188 / P2815 would completely destroy all provenance-based optimizations. It needs to propose some wording before we can say anything concrete about it, of course. And it might end up destroying some optimizations. But it needn't destroy all of them. And at first glance it doesn't seem out-of-keeping with the direction of C++.
(Likewise, the existence of `reinterpret_cast` doesn't destroy all type-based alias analysis. And `const_cast` famously destroys some-but-not-all const-based optimizations.)

Well, yes indeed, I'd need to see the full wording to make a proper assessment. When I wrote the email, it was based on information in the presentation slides in P2815, because I wasn't sure what the title P2188 referred to (interesting format for sure, a "paper" that is just a presentation on another paper). However, I'm skeptical that the actual paper can do what it purports to want to do w/o at least severely limiting those optimizations.

For background, I've done and commented on work on the Rust Programming Language's Operational Semantics Model relating to pointer provenance (which wants many similar optimizations to C++, and offers even more in some cases), and through that I've learned to be very careful of proposals with goals like "If two pointers have the same *value representation* they are identical". It may not be intended to break too many optimizations, but I've found that goals like this are typically incompatible with what compilers assume and need to assume in order for those optimizations to be valid. Rust solved the "pointers are trivially copyable but carry out-of-band data" problem by saying that bytes in memory *do* carry provenance data, it's just not easy (read, not possible) to observe w/o causing undefined behaviour. This was a good solution to the joint problem of "memcpy preserves pointers" and "pointers aren't just a number", though carrying its own problems, and solving those problems through what seem like natural extensions of the same solution run into the same kind of optimization-destroying effects. I think C++ adopting anything like "bytewise equivalence means can do the same things" w/o a solution like the provenance-in-byte-values model like Rust has is liable to hinder or enjoin most (if not all) provenance related optimizations, possibly w/o even realising it.

In the current world, touching anything relating to pointers is a lot like walking on a minefield - only the explosion is aimed at our generated code efficiency, rather than our feet.

(Citation on the abstract byte model, note that it also encodes uninitialized data/indeterminate values in the "Byte"): https://github.com/RalfJung/minirust/blob/master/mem/interface.md#abstract-bytes)

–Arthur