On Tue, Nov 10, 2020 at 4:25 AM Nicolai Josuttis via SG20 <sg20@lists.isocpp.org> wrote:

Hi,

for years, I have significant problems to teach the range-based for
loop, because of the bug that it has undefined behavior when iterating
over a reference to a temporary.
Thus, I
- either have to tell beginners significant constraints in using it
- or have to teach all the details why it is broken
including references, lifetime extension, universal references

Attached is a draft paper to fix the problem.
I welcome all feedback and if you want to be a co-author
(the more support the paper has the better).

(1) I approve of the proposed change, which IIUC is simply to move the destruction of the range object from its current position above the loop to down below the loop. It sounds like the goal is to make `for (auto&& elt : range) { x }` work basically the same as `ranges::for_each(range, []() { x })`, as shown in the lambda example on page 5. This is a good goal.

(2) I strongly disapprove of the sensationalistic title. This one change will not move the needle even a tiny bit on "healing C++"; the proposal addresses a corner case of a corner case. Range-based `for` is already quite teachable — or rather, it doesn't even need to be taught to most students, because they're already coming from languages that have the same `for` loop. People are also already aware that loops have pitfalls; the most common one is "modifying the container inside the loop," which usually leads to UB, and isn't addressed at all in your paper. (Your paper shouldn't address it. But that's just one example of how your paper doesn't "heal C++" and can't even be said to fix more than one of the minor existing problems around the `for` loop.)

My suggested title is "Increased lifetime for `for` loops' range objects."

(3) FWIW, I consider the motivating problem not a problem with for at all. for is easy to teach. The culprit here is "view types" (I've also called them "parameter-only types") — types which pretend to "have" an iterable range of elements, without actually participating in the ownership of that range, so that they can dangle if the backing storage is deallocated too early. C++20 Ranges makes this problem 10x worse, for sure. But your paper does a very good job of demonstrating that the problem is not confined to Ranges; you can get it via C++17 string_view or C++20 span as well, or even by chaining method calls as in `for (auto&& elt : foo().bar())`.

(3) On page 4 you say, "the API of ranges was significantly modified." Could you explain more (with a URL, or in a footnote) what you mean?

(4) On page 6, does "one of the authors" mean "one of the Facebook people"? or "Nico Josuttis"?

(5) /rather have less than more security risks/have fewer security risks, not more/

(6) /significant more/significantly more/

(7) Top of page 10: "Are there other places in the language that have similar problems?" Yes, obviously, literally anywhere that a dangling-reference bug might occur — that'd be a "similar problem." C++ is full of dangling references. For every place someone writes

for (auto elt : foo().bar())

you'll find two places where someone accidentally writes

const auto& local = foo().bar();

The latter is easier to fix once the programmer has been notified of the bug; but I've also seen it happen more often. (Although I admit that I probably see it more often simply because it's easier to spot visually.)

(8) Your example on page 10 seems mistaken, or else it needs further elaboration. You say that a loop like

for (auto elem : syncObj->getValues())

has a deadlock, but changing it to

for (const auto& elem : syncObj->getValues())

somehow removes the deadlock? I don't think the ref-ness of `elem` can possibly affect the outcome. If I'm right, and the ref-ness doesn't matter, then you should pick one type for `elem` and stick with it throughout the whole example.

(9) "a programmer still can avoid any overhead of the range-based for loop by using the optional initializer of the loop" — Initially, I thought you had this backwards. I had to work it out by hand (see below) to see what you were alluding to. Please add these examples to the paper, around this point. Here are the examples:

IIUC, in C++20 today,

struct X { X b(); X c(); }; X a();

for (auto elt : a().b().c()) { use(elt); }

compiles into the moral equivalent of

X aa = a(); X bb = aa.b(); X cc = bb.c();

destroy(bb); destroy(aa);

for (auto elt : cc) { use(elt); }

destroy(cc);

Your proposal says that it should compile into the moral equivalent of

X aa = a(); X bb = aa.b(); X cc = bb.c();

for (auto elt : cc) { use(elt); }

destroy(cc); destroy(bb); destroy(aa);

Then, if we look at C++20 ranged-for-with-initializer, today

for (auto bb = a().b(); auto elt : bb.c()) { use(elt); }

compiles into the moral equivalent of

X aa = a(); X bb = aa.b();

destroy(aa);

X cc = bb.c();

for (auto elt : cc) { use(elt); }

destroy(cc);

destroy(bb);

Your proposal doesn't propose any change to these semantics. Notably, you are not proposing that the lifetime of "X aa" be increased all the way to the bottom of the loop. "X aa" is just a temporary object used briefly in the initializing expression of `bb`, and so it should still get destroyed quickly at the end of the full-expression `a().b()`.

(10) I would like to see more explicit discussion of why you're not going to come back in three years complaining about how

for (auto bb = a().b(); auto elt : bb.c()) { use(elt); }

has a dangling reference to the result of `a()`. Why do you consider this dangling reference OK, whereas the one you're fixing has been some kind of mortal blow to C++'s teachability?

(11) Nitpick: The proposed wording, top of page 12, needs to mention `goto` as well as goto-labels. This proposal should have no effect on the program's ability to transfer control into and out of a for-loop's body. But of course the wording is extremely handwavey at the moment; I don't like any of these options; CWG will have to weigh in.

(12) I very much appreciate the way you avoid using the phrase "lifetime extension" to refer to this new thing. In my proposed title for your paper, I deliberately called it "Increased Lifetime," instead of "Extended Lifetime," because the mechanism you're proposing here is significantly different from C++98's "lifetime extension," and I think it's really important not to confuse the two mechanisms. I also kind of like the phrases "deferred destruction" and/or "delayed destruction [of range objects]."

(13) If you haven't read it lately, you might want to (re-)read this blog post of mine:

https://quuxplusone.github.io/blog/2020/03/04/field-report-on-lifetime-extension/

Specifically, "False Positive #1."

IIUC, your proposal would effectively cause `for` loops to be lowered into a form that does not rely on lifetime extension(*), which is awesome.

(* — Except if the "element" variable itself is lifetime-extended. This is more or less what happens in "True Positive #3" in my blog post.)

Modulo my strong objection to the current title in point (2), I would gladly sign on to this paper, if you wanted.

my $.02,

–Arthur