Pattern Matching: Auto-dereference (draft 1)

Document #:
Date:
Project: Programming Language C++
Reply-to:

Abstract

This paper argues, Dereference Pattern from current Pattern Matching (PM)1 proposal deviates from PM core principals and is a missed opportunity to create more expressive and less verbose system. The proposed solution is to have implicit dereferencing instead.

Background

PM is a way to access data by writing a pattern, describing what the data is. This is in contrast with regular code, which deals exclusively with how to get (to) the data.

Given

Rectangle rect = {Point{1,2}, Size{5,10}};

We request “x” if it is at origin (0) and “width”, if “height” is 10:

Normal Code

if(rect.pos.x == 0)
  x = rect.pos.x;
if(rect.size.height == 10)
  w = rect.size.width;

Using a pattern

[[__, let x = 0], [w, 10]]

(Both cases are oversimplified)

On the left, the regular code, describes a (branching) path to the data, where on the right, we have a snapshot of what the data is (needs to be) in order for the expression to yield result. Both the shape is described (using the []), as well as what the real-time values are expected to be (the digits). This is the essence of any Pattern Matching - to supply a model for the data to fit in - this is also the essence of Declarative Programming:

What instead of How

A natural property of what-instead-of-how is that we usually do not perform actions in such environments, as “an action” is as “how” as it can get. This means that if are to include any action into any declarative framework, we are moving away from its principals and core values. It is not to say, we can’t do it, but we must be aware, there is some design purity and consistency cost by doing that.

The Problem

Both Dereference and Extractor patterns are, predominantly, “actions”. Their purpose is to extract the data, before the patterns can be applied to the actual data.
Let’s examine Dereference in particular. Imagine, our rectangle had “size” and “position”, stored behind a pointer or an optional and how we would use PM on them:

[(*?) [__, let x = 0], (*?) [w, 10]]

There are two major things to notice.

First, there is a noticeable increase in verbosity. If all values needed indirection - the extraction code would have completely dominated the pattern. The problem is not just “more code”, but a “noisy looking one”, comprised entirely of non-character symbols, hardly self-explanatory.

Second, the extra patterns are not semantically desirable. Where the initially created pattern described point-and-size-inside-rectangle (our data verbatim), now we have to be extra specific point-and-size-stored-as-optional-like-inside-rectangle. How the data is stored becomes part of the pattern. The question is - is this desirable feature or a side-effect?

This paper argues, it is a side-effect. It is also not a desirable one because:

To further elaborate on the second point. For the Dereference pattern to not be just means to an end (to get the data), and be more like a regular pattern (describing the data), it must be used to make selections about the storage type.

Rectangle { Point pos; Width width; }; 
Rect      { std::optional<Point> position; std::optional<Width> width; };
...
inspect (r) 
{
  [(*?) __, (*?) __] => // we have width and height stored as optional-like
                  __ => // else...
}

In this example, the pattern is used exclusively to describe the data storage type and make a selection based on that.
However, the same can be expressed by other means:

  <Rect> [...] => // we have width and height stored as optional-like (because of type-select)
            __ => // else...

Alternatively, we can use a concept to represent a more broad selection.

All this comes to show, the Dereference pattern is “an action” first and a pattern (a distant) second, and even as pattern, it is not an essential one. Considering both the added verbosity and design deviation of having “an action” inside a declarative framework, there is good reason to look for alternatives.

It’s worth quickly mentioning the Extractor pattern. Most of what is said about the Dereference pattern applies there as well. In the Extractor pattern the desire to have “an action” is pretty much explicit (hence the name), what is different however is the fact, it can never be considered “an obstacle” for the data, because it is the data. Without an extractor, there is no PM for that object. Because of that most of the perceived downsides regarding design are alleviated.

Proposed solution

We saw how Dereference pattern adds additional code noise (to already obscure looking expressions!) and sits on a questionable design fundament by being primarily “an action” in a declarative framework.
Both issues can be avoided by simply doing automatic dereference for all optional-like types, skipping “the middle man” completely. We can afford to do this, because we are within the inherently safe environment of PM where failure is always expected. We will miss a glaring opportunity for “checks-free” code, if don’t take advantage of this environment and repeat practices from regular code, where “failure is not an option”. Here, failure is an option by design.

Let’s see this in practice with an extreme example.

Rect { std::optional<Point> position; std::optional<Width> width; };

std::shared_ptr<std::unique_ptr<Rect>> r;

Before

inspect(r) {
 (*?) (*?) [(*?) [__, let x = 0], (*?) [w, 10]] => // use x & w
                                             __ => // no match 
}

After

inspect(r) {
  [[__, let x = 0], [w, 10]] =>  // use x & w
                          __ =>  // no match for whatever reason
}

Left and right are equivalent. On the left we manually unwrap the redirections, where on the right this is done for us. An implicit check for null is performed and if it passes, an implicit dereference is performed as well. The process is recursive until we reach the top level.
In our example, the shared pointer is first check and dereferenced, then the unique pointer. At this point we have reached our data (as if there was no indirection) and PM proceeds as normal with deconstructing “r”. The process is the same for “size” and “position” members.

As you can see, the code returns back to it’s desired declarative nature - the pattern is all about the data itself, not how to get to it. The verbosity, unfamiliarity and obfuscation are reduced monumentally as well.

What do we lose?

As said, the Dereference pattern could be used as a way to select on storage type (one that uses indirection). Should be noted, the original paper does not mention this usage, which further solidifies the argument, this pattern is mainly envisioned as an action (to get to the data). That being said, the loss of this functionality should be mentioned for completeness, and again, the alternative is to use a form of “type select”.

This paper envisions the removal of *! as well. As the original paper does not explain the need for a separate form, this paper assumes, it is added for performance reasons, to avoid the extra check. If that is the case, this paper would argue, it is premature to think about such low level optimization. A well tuned implementation is first needed to asses such a requirement, because quite possibly the compiler will be able to optimize-away any redundant tests.
And there is the issue of safety - why poke a hole in safe-by-default system? When exactly will this unchecked dereference be actually correctly used? If the object is known to be available, from outside current scope, then why is it not passed as a reference in the first place, and if it is not known, then how the user can be sure, in a way invisible to the compiler (so that it can’t eliminate redundant checks)?

There is one last point, that needs to be examined - bindings.

Currently it is possible to create bindings before and after the dereference pattern, allowing us to have access to all structures, being part of the original object.

std::shared_ptr<std::unique_ptr<Rect>> r;
inspect(r) {
 let sh_r = (*?)(let u_r = (*?) (let r = [__, __])) => ... // sh_r is binding to shared_ptr
                                                           // u_r is binding to unique_ptr (after shared_ptr deref)
                                                           // r is binding to Rect (after both shared and unique_ptr deref)
}

This proposal envisions, by default, bindings to be to the data itself. This matches expectations.

inspect(r) {
 let r = [__, __] => ... // r is binding to Rect (after both shared and unique_ptr deref)
}

However, it is often needed to have bindings to the container as well, for example to pass the data along using its initial enclosure.
To accommodate this use case, new syntax is proposed, one that allows to bind at any step of the automatic dereference.
The result of an automatic dereference can be thought as list (a stack) of all dereferences. This list could be deconstructed:

std::shared_ptr<std::unique_ptr<Rect>> r;
inspect(r) {
 let sh_r,u_r,r = [__, __] => ... // sh_r is binding to shared_ptr
                                  // u_r is binding to unique_ptr (after shared_ptr deref)
                                  // r is binding to Rect (after both shared and unique_ptr deref)
}

It is possible to only bind to some objects by providing a name just for them.

let sh_r,, = [__, __] => ... // sh_r is binding to shared_ptr (only)

As you can see, this is a lightweight version of structured bindings, with each comma separating each “level” of dereference.
This way we not only have a correct default, binding to the data directly, but have a nice syntax when we need some or all of the “containers” as well.

Before

let sh_r = (*?)(let u_r = (*?) (let r = [__, __])) => ...

After

let sh_r,u_r,r = [__, __] => ...

Additional example from the paper:

struct Expr;
struct Neg {
  std::shared_ptr<Expr> expr;
};
struct Add {
  std::shared_ptr<Expr> lhs, rhs;
};
struct Mul {
  std::shared_ptr<Expr> lhs, rhs;
};
struct Expr : std::variant<int, Neg, Add, Mul> {
  using variant::variant;
};

Before

int eval(const Expr& expr) {
  return inspect (expr) {
    <int> i => i,
    <Neg> [(*?) e] => -eval(e),
    <Add> [(*?) l, (*?) r] => eval(l) + eval(r),
    // Optimize multiplication by 0.
    <Mul> [(*?) <int> 0, __] => 0,
    <Mul> [__, (*?) <int> 0] => 0,
    <Mul> [(*?) l, (*?) r] => eval(l) * eval(r)
  };
}

After

int eval(const Expr& expr) {
  return inspect (expr) {
    <int> i => i,
    <Neg> [e] => -eval(e),
    <Add> [l, r] => eval(l) + eval(r),
    // Optimize multiplication by 0.
    <Mul> [<int> 0, __] => 0,
    <Mul> [__, <int> 0] => 0,
    <Mul> [l, r] => eval(l) * eval(r)
  };
}

Notice the importance to have binding to data by default. We can have code as-if there was no indirection in the first place!

To have binding to the not-null pointer (and the data), the user will have to do <Neg> [ let pe, = e ]. pe binds to shared_ptr<Expr>. This is still less verbose then <Neg> [ let pe = (*?) e ]. To have bindings to potentially null pointer, the user will have to <Neg> [ let pe = __ ], which is, granted, more verbose then <Neg> [ pe ], but such a usage is an exception.

Conclusion

Presented here was a natural way to have PM over nullable objects, one that:


  1. Pattern Matching: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2020/p1371r2.pdf↩︎