sg20: Re: [SG20] A draft paper to fix the range-based for loop to make it teachable

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Tue, 10 Nov 2020 09:51:24 -0500

On Tue, Nov 10, 2020 at 4:25 AM Nicolai Josuttis via SG20 <
sg20_at_[hidden]> wrote:

> Hi,
>
> for years, I have significant problems to teach the range-based for
> loop, because of the bug that it has undefined behavior when iterating
> over a reference to a temporary.
> Thus, I
> - either have to tell beginners significant constraints in using it
> - or have to teach all the details why it is broken
> including references, lifetime extension, universal references
>
> Attached is a draft paper to fix the problem.
> I welcome all feedback and if you want to be a co-author
> (the more support the paper has the better).
>

(1) I approve of the proposed change, which IIUC is simply to move the
destruction of the range object from its current position above the loop to
down below the loop. It sounds like the goal is to make `for (auto&& elt :
range) { x }` work basically the same as `ranges::for_each(range, []() { x
})`, as shown in the lambda example on page 5. This is a good goal.

(2) I strongly disapprove of the sensationalistic title. This one change
will not move the needle even a tiny bit on "healing C++"; the proposal
addresses a corner case of a corner case. Range-based `for` is already
quite teachable — or rather, it doesn't even need to be taught to most
students, because they're already coming from languages that have the same
`for` loop. People are also already aware that loops have pitfalls; the
most common one is "modifying the container inside the loop," which usually
leads to UB, and isn't addressed at all in your paper. (Your paper
shouldn't address it. But that's just one example of how your paper doesn't
"heal C++" and can't even be said to fix more than one of the minor
existing problems around the `for` loop.)
My suggested title is "Increased lifetime for `for` loops' range objects."

(3) FWIW, I consider the motivating problem not a problem with for at all.
for is easy to teach. The culprit here is "view types" (I've also called
them "parameter-only types") — types which pretend to "have" an iterable
range of elements, without actually participating in the ownership of that
range, so that they can dangle if the backing storage is deallocated too
early. C++20 Ranges makes this problem 10x worse, for sure. But your paper
does a very good job of demonstrating that the problem is not confined to
Ranges; you can get it via C++17 string_view or C++20 span as well, or even
by chaining method calls as in `for (auto&& elt : foo().bar())`.

(3) On page 4 you say, "the API of ranges was significantly modified."
Could you explain more (with a URL, or in a footnote) what you mean?

(4) On page 6, does "one of the authors" mean "one of the Facebook people"?
or "Nico Josuttis"?

(5) /rather have less than more security risks/have fewer security risks,
not more/
(6) /significant more/significantly more/

(7) Top of page 10: "Are there other places in the language that have
similar problems?" Yes, obviously, literally anywhere that a
dangling-reference bug might occur — that'd be a "similar problem." C++ is
full of dangling references. For every place someone writes
    for (auto elt : foo().bar())
you'll find two places where someone accidentally writes
    const auto& local = foo().bar();
The latter is *easier to fix* once the programmer has been notified of the
bug; but I've also seen it *happen *more often. (Although I admit that I
probably see it more often simply because it's easier to spot visually.)

(8) Your example on page 10 seems mistaken, or else it needs further
elaboration. You say that a loop like
    for (auto elem : syncObj->getValues())
has a deadlock, but changing it to
    for (const auto& elem : syncObj->getValues())
somehow removes the deadlock? I don't think the ref-ness of `elem` can
possibly affect the outcome. If I'm right, and the ref-ness doesn't matter,
then you should pick one type for `elem` and stick with it throughout the
whole example.

(9) "a programmer still can avoid any overhead of the range-based for loop
by using the optional initializer of the loop" — Initially, I thought you
had this backwards. I had to work it out by hand (see below) to see what
you were alluding to. Please add these examples to the paper, around this
point. Here are the examples:
IIUC, in C++20 today,
    struct X { X b(); X c(); }; X a();
    for (auto elt : a().b().c()) { use(elt); }
compiles into the moral equivalent of
    X aa = a(); X bb = aa.b(); X cc = bb.c();
    destroy(bb); destroy(aa);
    for (auto elt : cc) { use(elt); }
    destroy(cc);
Your proposal says that it *should *compile into the moral equivalent of
    X aa = a(); X bb = aa.b(); X cc = bb.c();
    for (auto elt : cc) { use(elt); }
    destroy(cc); destroy(bb); destroy(aa);
Then, if we look at C++20 ranged-for-with-initializer, today
    for (auto bb = a().b(); auto elt : bb.c()) { use(elt); }
compiles into the moral equivalent of
    X aa = a(); X bb = aa.b();
    destroy(aa);
    X cc = bb.c();
    for (auto elt : cc) { use(elt); }
    destroy(cc);
    destroy(bb);
Your proposal doesn't propose any change to these semantics. Notably, you
are not proposing that the lifetime of "X aa" be increased all the way to
the bottom of the loop. "X aa" is just a temporary object used briefly in
the initializing expression of `bb`, and so it should still get destroyed
quickly at the end of the full-expression `a().b()`.

(10) I would like to see more explicit discussion of why you're not going
to come back in three years complaining about how
    for (auto bb = a().b(); auto elt : bb.c()) { use(elt); }
has a dangling reference to the result of `a()`. Why do you consider this
dangling reference OK, whereas the one you're fixing has been some kind of
mortal blow to C++'s teachability?

(11) Nitpick: The proposed wording, top of page 12, needs to mention `goto`
as well as goto-labels. This proposal should have no effect on the
program's ability to transfer control into and out of a for-loop's body.
But of course the wording is extremely handwavey at the moment; I don't
like any of these options; CWG will have to weigh in.

(12) I very much appreciate the way you avoid using the phrase "lifetime
extension" to refer to this new thing. In my proposed title for your paper,
I deliberately called it "Increased Lifetime," instead of "Extended
Lifetime," because the mechanism you're proposing here is significantly
different from C++98's "lifetime extension," and I think it's really
important not to confuse the two mechanisms. I also kind of like the
phrases "deferred destruction" and/or "delayed destruction [of range
objects]."

(13) If you haven't read it lately, you might want to (re-)read this blog
post of mine:
https://quuxplusone.github.io/blog/2020/03/04/field-report-on-lifetime-extension/
Specifically, "False Positive #1."
IIUC, your proposal would effectively cause `for` loops to be lowered into
a form that does not rely on lifetime extension(*), which is awesome.
(* — Except if the "element" variable itself is lifetime-extended. This is
more or less what happens in "True Positive #3" in my blog post.)

Modulo my strong objection to the current title in point (2), I would
gladly sign on to this paper, if you wanted.

my $.02,
–Arthur

Received on 2020-11-10 08:51:38