On Tue, Nov 10, 2020 at 10:50 AM Nicolai Josuttis <nico@josuttis.de> wrote:
Am 10.11.2020 um 15:51 schrieb Arthur O'Dwyer:

> (3) FWIW, I consider the motivating problem not a problem with for at
> all. for is easy to teach. The culprit here is "view types" (I've also
> called them "parameter-only types") — types which pretend to "have" an
> iterable range of elements, without actually participating in the
> ownership of that range, so that they can dangle if the backing storage
> is deallocated too early. C++20 Ranges makes this problem 10x worse, for
> sure. But your paper does a very good job of demonstrating that the
> problem is not confined to Ranges; you can get it via C++17 string_view
> or C++20 span as well, or even by chaining method calls as in `for
> (auto&& elt : foo().bar())`.

but in which sense is iterating over the elements of the first vector in
vector<vector<int>> a view?

Well, vector<vector<int>>::operator[] returns a reference-to-a-vector. Native reference types are like non-owning view types, except that in certain circumstances they get special treatment (lifetime extension, interaction with `auto`, interaction with template type deduction and reference collapsing, ...). But this is not one of those special circumstances.



> (3) On page 4 you say, "the API of ranges was significantly modified."
> Could you explain more (with a URL, or in a footnote) what you mean?
>
I thought I do by referring to
 https://cplusplus.github.io/EWG/ewg-active.html#120

That issue is from 2014/2015, though.
The example is
    for (int val : vec | reversed | uniqued) { use(val); }
which I agree falls into the pitfall fixed in your paper. But you said Ranges' design was changed somehow to deal with this?
- How was it changed?
- What evidence is there that the change was due specifically to this pitfall with for-loops?
I mean, if you're just talking about how Ranges conflates value category with lifetime, and prevents you from piping non-"view" rvalues into other view factories: Wasn't that change also to deal with things like
    auto temp = Person("Mo").getName() | reversed | uniqued;
    for (auto c : temp) std::cout << c;
which is not addressed by your paper? So it's not like this paper is going to permit Ranges (or anyone) to stop conflating value category with lifetime, even if the Ranges ship hadn't already sailed in 2020.

> (7) Top of page 10: "Are there other places in the language that have
> similar problems?" [...]
The special quality is that the use of a reference is not visible to the
programmer. That we don't have anywhere else.
So a key warning signal (and something I first have to teach) is missing.

Maybe I'm failing to see through a student's eyes here; but to me, the whole point of references is that you usually don't see them — at least, not at the call-site. When I see `foo(x)`, I don't immediately know if that's taking a reference to `x` or making a copy. And even if it's making a copy, the copy constructor itself will take `x` by reference!  The important thing to teach style-wise is that you should write your code so that it does not matter to the caller whether a reference is taken or not. (So: pass-by-const-reference as an optimization of pass-by-value, pass out-parameters by pointer instead of by reference, don't gratuitously return by reference, and so on.)
I dunno, maybe if this issue ever came up in class, I'd find it hard to teach why the dangling reference is happening. But for me it's never come up. And I certainly wouldn't bring it up.

> (9) [...]
> Then, if we look at C++20 ranged-for-with-initializer, today
>     for (auto bb = a().b(); auto elt : bb.c()) { use(elt); }
> compiles into the moral equivalent of
>     X aa = a(); X bb = aa.b();
>     destroy(aa);
>     X cc = bb.c();
>     for (auto elt : cc) { use(elt); }
>     destroy(cc);
>     destroy(bb);
> Your proposal doesn't propose any change to these semantics. Notably,
> you are not proposing that the lifetime of "X aa" be increased all the
> way to the bottom of the loop. "X aa" is just a temporary object used
> briefly in the initializing expression of `bb`, and so it should still
> get destroyed quickly at the end of the full-expression `a().b()`.
[...]
> (10) I would like to see more explicit discussion of why you're not
> going to come back in three years complaining about how
>     for (auto bb = a().b(); auto elt : bb.c()) { use(elt); }
> has a dangling reference to the result of `a()`. Why do you consider
> this dangling reference OK, whereas the one you're fixing has been some
> kind of mortal blow to C++'s teachability?
>
Well, first, so far I NEVER used or taught this new loop at all.
Second I am confused.
As bb is not a reference, where is the problem here?

Hmm... seems I slipped from a case where we agree there's a problem with lifetimes, into a case where we don't agree.
Here, arguably, there is a problem only if
- the programmer writes `auto&& bb = a().b();` which is totally realistic but also pretty clearly wrong (again, see my "Down with lifetime extension" post for the crazy things people have done in LLVM's codebase); or
- `a().b()` returns a view type such as `std::string_view`, which people shouldn't be doing.
But the same is true of your motivating example, right?

    for (auto elt : a().b().c()) { use(elt); }  // today, a()'s and b()'s lifetimes both end before the first iteration of the loop (but you propose to fix this)
    for (auto aa = a(); auto elt : aa.b().c()) { use(elt); }  // b()'s lifetime still ends before the first iteration of the loop (but you propose to fix this)
    for (auto bb = a().b(); auto elt : bb.c()) { use(elt); }  // a()'s lifetime still ends before the first iteration of the loop (you do not yet propose to fix this)

So I'm asking re the third case: are we sure you're not going to come back in 3 years saying "let's fix the lifetime of a() too"?


> (12) I very much appreciate the way you avoid using the phrase "lifetime
> extension" to refer to this new thing. [...] the mechanism you're proposing here is
> significantly different from C++98's "lifetime extension," and I think
> it's really important not to confuse the two mechanisms. [...]

I don't introduce a new language rule at all.

Well, you change the formal semantics of the core language from one thing, to another thing. That's "a new rule" in my book. :)  And the mechanism — basically increasing the effective range of a "full-expression" all the way to the bottom of a compound statement — seems new to me.

In fact, consider the difference between these two very contrived snippets:

    std::string temporary();
    std::optional<std::string_view> dangle(std::string_view sv) { return sv; }

    for (char ch : dangle(temporary()).value()) {
        cout << ch;  // UB today, well-defined after Nico's proposal
    }

    if (auto opt = dangle(temporary())) {
        cout << opt.value();  // UB today, UB forever
    }

You propose that in the for-loop, the full-expression `dangle(temporary()).value()` should have all its temporaries live all the way to the end of the block; but in the if-statement, the full-expression `dangle(temporary())` should not have all its temporaries live all the way to the end of the block. There's nothing wrong with this! But it is still potentially surprising to students, and I think it is reasonable for me to say that we're introducing a new language rule that applies in the first case but not in the second case.


All I say is that the range-based-for-loop operates as a black box
dealing with the expression after the : like with a function argument.

That's a good simplified way of conveying an intuition about what's happening, but it's specifically not all you're saying. :)  There's still going to be a formal language rule that describes what really happens, and teachers will still have to explain why the rule applies in some cases but not in others.

For example, a teacher might say "The magic lifetime rule kicks in for things on the right-hand side of a colon, but not for things on the right-hand side of an equals sign." And after this paper, we might say, "Lifetime extension applies to top-level native references on the right-hand side of an equals sign, but doesn't apply [because it doesn't need to apply] to anything on the right-hand side of a colon."

–Arthur