std-proposals: Re: Issues I have with current Pattern Matching

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Sun, 26 Jan 2020 10:46:06 -0500

[Adding P1371's authors to the CC list. The unabridged original of
Mikhail's post to std-proposals is here
<https://lists.isocpp.org/std-proposals/2020/01/0950.php>.]

On Sun, Jan 26, 2020 at 9:07 AM Михаил Найденов via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> The proposal in question:
> http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2020/p1371r2.pdf
>
> Hello, I would like to share my core concerns with the current PM proposal.
>
> *1. I don't see the need for the dereference pattern.*
>
> This pattern introduces (extreme) noise and unfamiliar (to put it mildly)
> look into the code.
>
> It is not needed, because
> A) in PM we can always have implicit test and dereference
> B) the user can use a extractor for the *! case
>

The "dereference pattern" (named that in the paper) is pattern-matching on
the pointee of a pointer. The non-trivial example from the paper is:

enum Color { Red, Black };

template<class T>
struct Node {
    void balance();
    Color color;
    std::shared_ptr<Node> lhs;
    T value;
    std::shared_ptr<Node> rhs;
};

template<class T>
void Node<T>::balance() {
    *this = inspect (*this) {
        [case Black, (*?) [case Red, (*?) [case Red, a, x, b], y, c], z, d]
            => Node{Red, std::make_shared<Node>(Black, a, x, b),
                    y, std::make_shared<Node>(Black, c, z, d)},
        [case Black, (*?) [case Red, a, x, (*?) [case Red, b, y, c]], z, d]
// left-right case
            => Node{Red, std::make_shared<Node>(Black, a, x, b),
                    y, std::make_shared<Node>(Black, c, z, d)},
        [case Black, a, x, (*?) [case Red, (*?) [case Red, b, y, c], z, d]]
// right-left case
            => Node{Red, std::make_shared<Node>(Black, a, x, b),
                    y, std::make_shared<Node>(Black, c, z, d)},
        [case Black, a, x, (*?) [case Red, b, y, (*?) [case Red, c, z, d]]]
// right-right case
            => Node{Red, std::make_shared<Node>(Black, a, x, b),
                    y, std::make_shared<Node>(Black, c, z, d)},
        self => self // do nothing
    };
}

If I understand your "extractor pattern" idea correctly, you're suggesting
that it could be rewritten as follows:

enum Color { Red, Black };

template<class T>
struct Node {
    void balance();
    Color color;
    std::shared_ptr<Node> lhs;
    T value;
    std::shared_ptr<Node> rhs;
};

namespace std {
    struct Deref {
        template<class T>
        static const T& extract(const T *p) { return *p; }
        template<class T>
        static const T *try_extract(const T *p) { return p; }
    };
    inline constexpr Deref deref;
}

template<class T>
void Node<T>::balance() {
    *this = inspect (*this) {
        [case Black, (std::deref?) [case Red, (std::deref?) [case Red, a,
x, b], y, c], z, d]
            => Node{Red, std::make_shared<Node>(Black, a, x, b),
                    y, std::make_shared<Node>(Black, c, z, d)},
        [case Black, (std::deref?) [case Red, a, x, (std::deref?) [case
Red, b, y, c]], z, d] // left-right case
            => Node{Red, std::make_shared<Node>(Black, a, x, b),
                    y, std::make_shared<Node>(Black, c, z, d)},
        [case Black, a, x, (std::deref?) [case Red, (std::deref?) [case
Red, b, y, c], z, d]] // right-left case
            => Node{Red, std::make_shared<Node>(Black, a, x, b),
                    y, std::make_shared<Node>(Black, c, z, d)},
        [case Black, a, x, (std::deref?) [case Red, b, y, (std::deref?)
[case Red, c, z, d]]] // right-right case
            => Node{Red, std::make_shared<Node>(Black, a, x, b),
                    y, std::make_shared<Node>(Black, c, z, d)},
        self => self // do nothing
    };
}

On the one hand, this doesn't seem any *more* cryptic than the current
proposal. The current proposal makes up its own novel mini-language,
complete with un-C++ syntax (postfix `!`, novel use of parentheses) and new
"not reserved but still magic" identifiers which must be hard-coded into
the compiler (`extract`, `try_extract`). But, for that same reason, it
doesn't seem to offer much benefit. The compiler implementation gets
marginally simpler (maybe save 100 lines of compiler code, out of the tens
of thousands this feature would take?), the library implementation gets 6
lines longer, and all user code gets marginally longer (has to use
`std::deref` instead of `*`). I'm sure that *this change in isolation*
would not be worth the time it would take to discuss it. If you could find
a way to eliminate *all* of the cryptic magic in this proposal — from the
`*?` mini-language to the cumbersome "extractor" API — then that would be
worth discussing.

By the way, here's how you'd write the red-black rebalancing example in
present-day C++14:

template<class T>
void Node<T>::balance() {
    auto is_red = [](auto&& p) { return p && p->color == Red; };
    auto new_node = [](auto&& a, auto&& x, auto&& b, auto&& y, auto&& c,
auto&& z, auto&& d) {
        return Node(Red, std::make_shared<Node>(Black, a, x, b),
                    y, std::make_shared<Node>(Black, c, z, d));
    };
    if (color == Black && is_red(lhs) && is_red(lhs->lhs)) {
        *this = new_node(lhs->lhs->lhs, lhs->lhs->value, lhs->lhs->rhs,
lhs->value, lhs->rhs, value, rhs);
    } else if (color == Black && is_red(lhs) && is_red(lhs->rhs)) {
        *this = new_node(lhs->lhs, lhs->value, lhs->rhs->lhs,
lhs->rhs->value, lhs->rhs->rhs, value, rhs);
    } else if (color == Black && is_red(rhs) && is_red(rhs->lhs)) {
        *this = new_node(lhs, value, rhs->lhs->lhs, rhs->lhs->value,
rhs->lhs->rhs, rhs->value, rhs->rhs);
    } else if (color == Black && is_red(rhs) && is_red(rhs->rhs)) {
        *this = new_node(lhs, value, rhs->lhs, rhs->value, rhs->rhs->lhs,
rhs->rhs->value, rhs->rhs->rhs);
    } else {
        // do nothing
    }
}

I don't claim that this code is *better* than the pattern-matching version,
but I don't think it's *worse*.
By introducing names for the subtrees (a,b,c,d), the pattern-matching code
makes it easier to see what's happening in the *algorithm*. By eschewing
arcane syntax and using C++ semantics (e.g. `auto&&` to indicate that we're
definitely not making an unnecessary copy), the C++14 code makes it easier
to see what's happening in the *code*.

[...]

*4. The lack of logical operators is a severe limitation, which I don't
> believe is justified.*
>
> Considering we have *two* ways to write logical operators, we could
> reserve one for combining patterns and let the other for expressions
>
> inspect(v) {
> (a || b) or 5 => ...
> ^^^ expression OR
> ^^^ pattern OR
> }
>
> Yes, this will complicate parsing (inside `inspect` rules will change a
> bit) and
> Yes, the user is forced into syntax style (and new rules who to read
> code), but both are well worth the gain.
> Using "ma[t]chers" for such basic tasks* will make C++ look bad*.
>

But in point 1 you were arguing in *favor* of library code and *against*
weird core-language syntax! Now you've flipped around and want to see a
novel mini-language operator instead of a standard library matcher
`std::any_of`?

Now, admittedly, the proposal doesn't seem to have gotten as far as
thinking about the library side of things. The only mentions of `namespace
std` in the paper are where the sample code reopens `namespace std` to add
user code to it (a bad habit that I'd like to see discouraged). So when
section 9.3 talks about the matcher `any_of`, it's not necessarily talking
about adding that matcher to the library. They might still be thinking that
every industry codebase would reimplement their *own* `any_of` (or `AnyOf`)
matcher.

The other problem with adding `any_of` to the standard library is that
`std::any_of` already exists — it's an STL algorithm! So they'll have to
pick a new name for it if they want to put it into `namespace std`.

Sure, if we add novel built-in core-language syntax for everything we can
think of, then we don't need to worry about library design... but it just
results in bad *language* design.

I also strongly object to your proposed core-language syntax. You ask for
`||` and `or` to mean subtly different things! Presumably you would also
ask for a subtle difference between `&&` and `and`, and between `!` and
`not`? Right now, from C++03 all the way to C++17, these tokens have
always meant exactly the same thing — to such a great extent that the
compiler interprets `void f(T and x)` as a synonym for `void f(T&& x)`.
The current state of affairs is a little silly, but it is *very very*
teachable, because it is consistent. "`or` means `||`." Done. You're
proposing to greatly complicate the current state of affairs, by changing
the explanation of `or` from 2 characters into several sentences (and those
sentences will have to involve a very obscure and arcane feature with lots
more crufty corner cases).

(Also note that C++2a Concepts changes the meaning of `||` and `&&` so that
they mean subtly different things in requires-clauses from what they mean
anywhere else in the language. I consider this a big mistake in terms of
language design. So this year I may be overly sensitive to *additional
*requests
to change the meanings of logical operators in subtle ways.)

–Arthur

Received on 2020-01-26 09:48:53