sg20: Re: [SG20] A draft paper to fix the range-based for loop to make it teachable

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Tue, 10 Nov 2020 14:56:02 -0500

On Tue, Nov 10, 2020 at 10:50 AM Nicolai Josuttis <nico_at_[hidden]> wrote:

> Am 10.11.2020 um 15:51 schrieb Arthur O'Dwyer:
>
> > (3) FWIW, I consider the motivating problem not a problem with for at
> > all. for is easy to teach. The culprit here is "view types" (I've also
> > called them "parameter-only types") — types which pretend to "have" an
> > iterable range of elements, without actually participating in the
> > ownership of that range, so that they can dangle if the backing storage
> > is deallocated too early. C++20 Ranges makes this problem 10x worse, for
> > sure. But your paper does a very good job of demonstrating that the
> > problem is not confined to Ranges; you can get it via C++17 string_view
> > or C++20 span as well, or even by chaining method calls as in `for
> > (auto&& elt : foo().bar())`.
>
> but in which sense is iterating over the elements of the first vector in
> vector<vector<int>> a view?
>

Well, vector<vector<int>>::operator[] returns a reference-to-a-vector.
Native reference types are *like* non-owning view types, except that in
certain circumstances they get special treatment (lifetime extension,
interaction with `auto`, interaction with template type deduction and
reference collapsing, ...). But this is not one of those special
circumstances.

> > (3) On page 4 you say, "the API of ranges was significantly modified."
> > Could you explain more (with a URL, or in a footnote) what you mean?
> >
> I thought I do by referring to
> https://cplusplus.github.io/EWG/ewg-active.html#120

That issue is from 2014/2015, though.
The example is
    for (int val : vec | reversed | uniqued) { use(val); }
which I agree falls into the pitfall fixed in your paper. But you said
Ranges' design was changed somehow to deal with this?
- How was it changed?
- What evidence is there that the change was due specifically to this
pitfall with for-loops?
I mean, if you're just talking about how Ranges conflates value category
with lifetime
<https://quuxplusone.github.io/blog/2019/03/11/value-category-is-not-lifetime/>,
and prevents you from piping non-"view" rvalues into other view factories:
Wasn't that change also to deal with things like
    auto temp = Person("Mo").getName() | reversed | uniqued;
    for (auto c : temp) std::cout << c;
which is not addressed by your paper? So it's not like this paper is going
to permit Ranges (or anyone) to *stop *conflating value category with
lifetime, even if the Ranges ship hadn't already sailed in 2020.

> (7) Top of page 10: "Are there other places in the language that have
> > similar problems?" [...]

The special quality is that the use of a reference is not visible to the
> programmer. That we don't have anywhere else.
> So a key warning signal (and something I first have to teach) is missing.
>

Maybe I'm failing to see through a student's eyes here; but to me, the
whole point of references is that you usually don't see them — at least,
not at the call-site. When I see `foo(x)`, I don't immediately know if
that's taking a reference to `x` or making a copy. And even if it's making
a copy, the copy constructor itself will take `x` by reference! The
important thing to teach style-wise is that you should write your code so
that it *does not matter* to the caller whether a reference is taken or
not. (So: pass-by-const-reference as an optimization of pass-by-value, pass
out-parameters by pointer instead of by reference, don't gratuitously
return by reference, and so on.)
I dunno, maybe if this issue ever came up in class, I'd find it hard to
teach why the dangling reference is happening. But for me it's never come
up. And I certainly wouldn't *bring* it up.

> (9) [...]
> > Then, if we look at C++20 ranged-for-with-initializer, today
> > for (auto bb = a().b(); auto elt : bb.c()) { use(elt); }
> > compiles into the moral equivalent of
> > X aa = a(); X bb = aa.b();
> > destroy(aa);
> > X cc = bb.c();
> > for (auto elt : cc) { use(elt); }
> > destroy(cc);
> > destroy(bb);
> > Your proposal doesn't propose any change to these semantics. Notably,
> > you are not proposing that the lifetime of "X aa" be increased all the
> > way to the bottom of the loop. "X aa" is just a temporary object used
> > briefly in the initializing expression of `bb`, and so it should still
> > get destroyed quickly at the end of the full-expression `a().b()`.
> [...]
> > (10) I would like to see more explicit discussion of why you're not
> > going to come back in three years complaining about how
> > for (auto bb = a().b(); auto elt : bb.c()) { use(elt); }
> > has a dangling reference to the result of `a()`. Why do you consider
> > this dangling reference OK, whereas the one you're fixing has been some
> > kind of mortal blow to C++'s teachability?
> >
> Well, first, so far I NEVER used or taught this new loop at all.
> Second I am confused.
> As bb is not a reference, where is the problem here?
>

Hmm... seems I slipped from a case where we agree there's a problem with
lifetimes, into a case where we don't agree.
Here, arguably, there is a problem only if
- the programmer writes `auto&& bb = a().b();` which is totally realistic
but also pretty clearly wrong (again, see my "Down with lifetime extension"
post
<https://quuxplusone.github.io/blog/2020/03/04/field-report-on-lifetime-extension/>
for the crazy things people have done in LLVM's codebase); or
- `a().b()` returns a view type such as `std::string_view`, which people
shouldn't be doing.
But the same is true of your motivating example, right?

    for (auto elt : a().b().c()) { use(elt); } // today, a()'s and b()'s
lifetimes both end before the first iteration of the loop (but you propose
to fix this)
    for (auto aa = a(); auto elt : aa.b().c()) { use(elt); } // b()'s
lifetime still ends before the first iteration of the loop (but you propose
to fix this)
    for (auto bb = a().b(); auto elt : bb.c()) { use(elt); } // a()'s
lifetime still ends before the first iteration of the loop (you do *not*
yet propose to fix *this*)

So I'm asking re the third case: are we sure you're not going to come back
in 3 years saying "let's fix the lifetime of a() too"?

> (12) I very much appreciate the way you avoid using the phrase "lifetime
> > extension" to refer to this new thing. [...] the mechanism you're
> proposing here is
> > significantly different from C++98's "lifetime extension," and I think
> > it's really important not to confuse the two mechanisms. [...]
>
> I don't introduce a new language rule at all.
>

Well, you change the formal semantics of the core language from one thing,
to another thing. That's "a new rule" in my book. :) And the mechanism —
basically increasing the effective range of a "full-expression" all the way
to the bottom of a compound statement — seems new to me.

In fact, consider the difference between these two very contrived snippets:

    std::string temporary();
    std::optional<std::string_view> dangle(std::string_view sv) { return
sv; }

    for (char ch : dangle(temporary()).value()) {
        cout << ch; // UB today, well-defined after Nico's proposal
    }

    if (auto opt = dangle(temporary())) {
        cout << opt.value(); // UB today, UB forever
    }

You propose that in the for-loop, the full-expression
`dangle(temporary()).value()` should have all its temporaries live all the
way to the end of the block; but in the if-statement, the full-expression
`dangle(temporary())` should *not *have all its temporaries live all the
way to the end of the block. There's nothing wrong with this! But it is
still potentially surprising to students, and I think it is reasonable for
me to say that we're introducing a new language rule that applies in the
first case but not in the second case.

All I say is that the range-based-for-loop operates as a black box
> dealing with the expression after the : like with a function argument.
>

That's a good simplified way of conveying an intuition about what's
happening, but it's specifically not *all* you're saying. :) There's still
going to be a formal language rule that describes what *really* happens,
and teachers will still have to explain why the rule applies in some cases
but not in others.

For example, a teacher might say "The magic lifetime rule kicks in for
things on the right-hand side of a colon, but not for things on the
right-hand side of an equals sign." And after this paper, we might say,
"Lifetime extension applies to top-level native references on the
right-hand side of an equals sign, but doesn't apply [because it doesn't
*need* to apply] to anything on the right-hand side of a colon."

–Arthur

Received on 2020-11-10 13:56:16