Document #: | xxx |
Date: | 2022-10-06 |
Project: | Programming Language C++ SG17 |
Reply-to: |
Mihail Naydenov <mihailnajdenov@gmail.com> |
This is a new Pattern Matching (PM) proposal with a more limited scope then previous efforts, aimed for C++26 release.
Current proposal aims to be conservative in its goals in order to make it in C++26.
This means:
The first point is discussed in PXXXX, which proposes a reduced scope for the initial release of PM, agnostic of any concrete implementation.
The second point is the motivation for this proposal - to have PM, based heavily on what is already available, both in the language and agreed upon in previous PM proposals. This means as less as possible new syntax and no new concepts beyond the PM itself.
One of the main challenges in PM has to do with identifiers. There is an unavoidable ambiguity what we mean by a variable named x
: Is it a newly introduced entity or a reference to an existing one? There is no right answer and each system decides on its own. For example in Python, x
is a new name and if the user whats to access an existing one it must contain a dot (essentially to be a path) - something.x
. In Rust also x
is a new name. Swift and C# introduce new names the same way this is done outside PM - with let
and var
respectively.
Variable introduction style might seem like minor, “esthetics” issue, but here it is not:
class s{}; s s;
.These two add up to the preexisting ambiguities inside PM, creating even more complicated context.
Let’s examine the number of possibilities for what an identifier could mean in the C++ pattern matching scenario. We mentioned some of them already:
x
can be a PM variable, a bindingx
can be a non-PM variable, one to compare againstx
can be a typeBefore going to the 4th one, here are our current solutions for the above 3.
The main PM proposal (p1371)1:
x
.case
- case x
.<x>
.Alternative Pattern also accepts a constant-expression, it is not clear if this means a constant variable as well. In that case we could have an ambiguity whatever
<x>
means atypename x
or a variablex
, used to select the alternative.
The “is as” proposal (p2392)2:
is
or as
.is
.is size
, do we mean a type or a variable we happen to have?As you can see, things are already not easy with these 3 cases, but there is one more:
- x
can be an existing variable, used for binding, an assignment.
Arguably, this case mainly concerns patterns outside PM. For example, we can imagine extending structured bindings to bind to existing variables:
Yes, there is std::tie
, however it’s a hidden, not obvious way of doing things. More importantly however, it is not composable with any patterns:
some a,b;
some-pattern std::tie(some-pattern a, b) = something; //< can't bind to preexisting variables from within patterns
Outside PM we will certainly need to bind to existing variables, but binding inside PM could also be beneficial. Current proposals do not address this usage directly. We can imagine, the simple structured bindings cases could work, however:
is
, used for “new binding”.As you can see, there are plenty of complications regarding identifiers in the PM context with many potential uses and very limited ways to express them.
If we step back and consider what syntax we would like to see as a natural way of doing type switching, we should look no further then the very first proposals for Pattern Matching like pattern-matching-November-20143 or the early p0095.4 They both advertise something along the lines of
to select (and deconstruct) a type Some
from a polymorphic source value. This makes sense - deconstructing is the opposite of constructing and often programming languages use syntax, very similar to either class definition or construction:
Rust class definition
C# class definition
Python class construction
Scala class definition
In other words, our initial intention to have deconstruction mirroring construction was more or less “right”. Something however happen in the process and we drifted away it. Using square instead of curly brackets is understandable, considering Structured Bindings are already in the language. However, we also lost the arguably natural way of expressing the desired type:
The as
, although looking nice, is foreign to C++. To change this fact, we must alter the language, adding as
outside PM also. On the other hand, angle brackets can hardly be perceived as desirable and pretty - considering the recursive nature of PM, they will bring extra visual noise.
Currently proposed inspect
is hypothetically “the least evil”. Still, almost certainly it will break some code somewhere as it is not an uncommon name for a function. To mitigate that, we’ll probably have to parse ahead and be smart about it.
As stated in the beginning, this proposal envisions much smaller scope for the initial release, compared to previous efforts.
Namely, only the following items are proposed:
if
guardFor motivation and further details about the scope, please refer to PXXXX.
Other then reduced scope, maximum reuse is proposed. This means, among other things, making PM a direct improvement of switch
. To get around parsing difficulties, we can make a small syntactical change, still using the switch
keyword, but with square brackets:
switch (a) //< old C-switch for `a`
...
switch [a] //< pattern match for `a`
...
switch [a, b] //< pattern match for both `a` and `b`
With square brackets after switch
, a PM expression is introduced, instead of a C-style switch
statement.
Inside the PM expression, two kinds of condition-result pairs are proposed:
case PTRN:
and default:
, which work the same way as with the old switch
, except there is no fallthrough. Also, any pattern (PTRN
) can be used, not just a constant-expression, including recursive ones. These kinds of condition-results don’t contribute to the result of the PM expression, but allow for use of break
, return
and continue
with the same behavior as the old switch
.PTRN =>
which is new and is followed by an expression, providing a result for the PM expression itself. The expression can be empty only if the result type is void
.This simple setup give us a lot of advantages with little effort:
switch
” (w/o fallthrough), just by replacing the brackets around an existing switch
.switch
”, one that can compare strings for example.return
. If we want special treatment for no-return, we can introduce an attribute, otherwise we just reuse case
and default
.Wildcard pattern
Wildcard pattern (__
) is proposed as identical to the one in the main proposal. It will match any value and will lead to execution of the expression or statement, right of =>
or case:
This is an overview, including future directions. For the concretely proposed items see next section.
In the previous section we made the observation, an identifier can be one of those 4 things:
It is clear, it is not possible to have a simple, single identifier without encountering multiple ambiguities. To account for that, we must “mark” identifiers for their specific use. In the main proposal this is done by using case
for “existing variable to compare against” as well as having angle brackets around type names. In the “is as” proposal, the location of the identifier in relation to is
or as
is the determining factor.
Regarding the latter, we can make an observation, is
is quite similar to an operator, we already have, the operator==
. This means, marking values for comparison with an ==
can be a reasonable approach as well:
int value = 5;
int number = ...;
switch [number] {
==value => //< matches value
// or !=value if we want!
...
};
One interesting side effect is that by doing so, we can also naturally introduce the other comparison operators:
switch [number] {
>=value => //< greater or equal
<=value => //< less or equal
< value => //< less then
> value => //< greater then
...
};
This looks promising, the code is self-explanatory and we did not have to introduce any new syntax or language features. Let that be our starting point - values to compare against must start with the equality operator tokens ==
.
Using
==
does not exclude intrudingis
in the future!
Moving forward. In the language we already have something similar to a “binding, creating a new variable”. This is the lambda capture:
[a=something] { // New variable `a` is introduced for the lambda with the same value as `something`.
...
}
We can reuse the syntax for PM as well:
int number = ...;
switch [number] {
n= __ => // new binding is introduced for the PM with same value as the matched ::number
// or just n= omitting the __
...
}
Here, much like with the values for comparison, we can easily go one step further, reusing even more syntax and semantics, and also have mutable and immutable bindings:
int number = ...;
void mutate(int&){...}
switch [number] {
n= __ => mutate(n); // FAILS, n is immutable by default!
&n= __ => mutate(n); // SUCCEEDS, n is requested to be mutable
...
};
Once we have x=
established to mean “new variable/binding” and &
to mean “mutability/reference”, naturally “reference to existing variable” can be introduced:
int number = ...;
int n;
switch [number] {
&n __ => mutate(n); // binding to existing variable with the same value as ::number
// or just &n omitting the __
...
}
Once again, this is all very, very close to the lambda captures, we already know:
lambda capture
Having covered values for comparison and various bindings, only type names remain. Because all identifiers so far were “marked” by additional tokens, we could finally have “unmarked” names for types:
All this together will allow for zero ambiguity identifiers usage:
struct size{...};
size size;
switch [...] {
size => // match ::size type
size= => // new `size` biding, hiding ::size
&size= => // new `size` biding, hiding ::size, mutable
&size => // assign to ::size variable
==size => // match sizes, same as ::size
>size => // match sizes, bigger then ::size
...
};
Because the aim is for baseline, not all of the above is proposed, in particular, binding to existing variables, as well as the greater-then and less-then comparisons are not proposed.
This leaves us with:
=
- immutable binding&
id=
- mutable binding==
id - variable for equality comparisonFurther, having to compare to a value using ==
might seem verbose and atypical. Similarly, having single identifier to mean a type name can be confusing:
struct size{...};
size size;
switch [...] {
size => //< There is a certain expectation, this is a value, not a type, we compare against!
[size, size] => //< Here as well
};
To account for that, proposed (optional) is to have the ==
be optional if it is the last pattern, which is often the case. To avoid ambiguity, the type name must never be the last pattern. In practice this means, to match on type, one must follow up it’s name with another pattern, like __
:
struct size{...};
size size;
switch [...] {
size __ => //< Size is definitely a type
size => //< Value, "as expected"
[size, size] => //< Values here as well
[==size, ==size] => //< Same as above, more verbose, but more clear?
size size => //< also valid, type is size, value equals the ::size variable
size (==size) => //< same as above, but more clear
};
One interesting benefit from not requiring ==
is that it would give us “safer switch” without code changes:
regular switch
The only textual difference between these two are the brackets after switch
and an semicolon at the end.
int x = 5;
switch [...] {
x => // match ::x
==x => // same as above, but explicit
x= => // new biding, hiding ::x
x=x => // new biding, match value ::x
x=(==x) => // same as above, but explicit
int x => // match int, match value ::x
px= (int x=x) => // bind px to a polymorphic type, then match the type to int, match value to ::x, bind result to a new `x`
int &x=x => // match int, match value x, bind result to a new `x` as mutable
...
};
Both current proposals, at some point, used star (*
) as a “Dereference Pattern”. The main proposal moved away from it under the guidance of the committee and it now uses (*?)
for “null-check-and-match” and (*!)
for “match-without-null-checking”. The motivation behind the change was possible confusion with the dereference operator. The “Is As” proposal, still uses the star.
This paper does not entirely agree, there could a confusion with the dereference operator - PM often uses expressions, which mean something else in normal code. Still, to move forward and accept decisions, already taken, the ?
token is proposed instead:
The ?
models the use of the conditional operator (x ? *x : ...
) in C/C++, as well as the “safe call” operator (x?.member
) in other languages (Kotlin, Swift). The behavior is “null-check-and-match”, the same as (*?)
from the main proposal. Non-checking pattern is not proposed.
Combining dereference with identifiers, we can create bindings at any level:
int* px = ...;
int x = 5;
switch [px] {
?x => // pointer is non-null, value matches ::x
?x= => // pointer is non-null, new x binding to int value
x=? => // pointer is non-null, new x binding to pointer to int
?x=x => // pointer is non-null, value matches ::x, new x binding to int value
x=?x => // pointer is non-null, value matches ::x, new x binding to pointer to int
x=?y= => // pointer is non-null, new x bindings to pointer to int, as well as new binding y to the value
...
}
The same pattern can be used for any dereferenceable type (that is also contextually convertible to bool
). Here optional
is combined with a pointer.
std::optional<int*> ppx = ...;
switch [ppx] {
?x= => // optional is non-null, new x binding to the pointer (which might be null)
??x= => // both pointer and optional are non-null, new x binding to the value
x=?? => // both pointer and optional are non-null, new x binding to the optional
?x=? => // both pointer and optional are non-null, new x binding to the pointer
x=?y=?z= => // both pointer and optional are non-null, new bindings for all levels (optional, pointer, value)
...
}
Proposed is simple, positional only deconstruction:
struct size{ int width; int height; };
size size = ...;
int w = 5;
int h = 9;
switch [size] {
[w=, h=] => // deconstruct size into w and h variables
[w, h] => // match size if width member is equal to ::w and height is ::h
[==w, ==h] => // same as above, more verbose, but more clear?
[&w=, h=] => // deconstruct size into w and h, w being mutable reference to the member
[w, h=] => // deconstruct size only into h, if width member is equals to ::w
[w=w, h=h] => // deconstruct into w and h variables, if both width is equal to ::w and height to ::h
// [w= ==w, h= ==h] // same as the last one, but needlessly verbose and ultimately weird
// [==w w=, ==h h=] //
};
As stated in PXXXX, no new customization point is proposed. If structured bindings can be used on the type, then it can be deconstructed as well. For the initial release designators are not proposed.
Literals and enumerations can be used to match against their value, exactly the same as variable identifiers.
struct Size{ int width; int height; };
enum Color { red, black};
struct Shape { Size size; Color color; };
Shape shape = ...;
int zero = 0;
switch [shape] {
[[0,0], Color::red] => // match shape if size is all zero and color is red
[[zero,zero], red] => // same as above
[[==0,zero], ==red] => // same as above
};
Please note, because we don’t have implicit variable introduction, all value comparisons are done uniformly, independent of whether we compare to a variable, enum or literal:
This Proposal
enum {red, black}; // integer constant
constexpr int zero = 0, one = 1; // modern integer constant
switch [...] {
red => // constant matched
zero => // same, constant matched
==zero => // constant matched, verbose alternative
==red => // same, constant matched
};
Uniform comparison b/w enums and variables.
In the main PM proposal, these kind of patterns are handled by the “Expression Pattern” and are defined as “matches value if a call to member
e.match(v)
or else a non-member ADL-onlymatch(e, v)
is contextually convertible to bool and evaluates to true where e is constant-expression”. This is not the case here, neither an extension point nor executing arbitrary expressions are proposed. Comparison is done via theoperator==
exclusively (else using==
will be simply wrong); the user is free to define a variable, outside PM, with the result of an arbitrary expression and use it instead for comparison. This also means some oldswitch
-s, those that use expression incase
-s, will not be directly portable to PM. Motivation: running arbitrary expressions will eat up syntax that we’ll need for other things, we want[something(), else(2), entirely<int>{}]
to be available even if not used yet.
Matching on type is done by simply naming the type (or in some cases the concept) in front of another pattern:
std::any val = ...;
int x = 5;
switch [val] {
float f= => // val is float, bind the result to f
int x => // val is int, equal to ::x
int (==x) => // same as above, more verbose, but more clear?
std::string s="hi" => // val is std::string, equal to the literal "hi", bound to s
};
struct Shape { ... };
struct Circle : Shape { int radius; };
struct Rectangle : Shape { int width, height; };
auto& shape = ...;
switch [shape] {
Circle __ => // shape is Circle
Rectangle [w, h] => // shape is Rectangle, deconstructed into w and h
};
The details how matching is done in the any
and polymorphic types cases are the same as in the main PM proposal (p1371). This proposal however differs how variant
-s are matched.
Matching variants
Currently active proposals match variant
-s the same way any
and polymorphic types are matched. This is not entirely correct, because variant
-s have a “has a” relationship with the types they can hold5, instead of the “is” relationship of any
and polymorphic types. For example, we get
from a variant
, but we cast an any
or a polymorphic type!
This difference will become even more pronounced with the introduction of language support of tagged unions. Then, the alternatives will be visibly inside the union as members.
Because of this, proposed is to access variants
by deconstruction first:
template<class T>
concept Number = ...;
std::variant<int, float, std::string> val = ...;
int x = 5;
switch [val] {
[float f=] => // val has a float, bind the result to `f`
[float &f=] => // val has a float, bind to `f` as reference
[int x] => // val has a int, equal to ::x
[int (==x)] => // same as above, more verbose, but more clear?
[std::string s="hi"] => // val has a std::string, equal to the literal "hi", bound to `s`
[Number num=] => // val has a number, either int or float, bind to `num`
[whatever=] => // val has any of the alternatives, we don't care which, bind to `whatever`
};
Please note the last example. By deconstructing first we eliminate the need for the “auto” selector as defined in the main proposal’s Alternative Pattern. We simply “enter” the object, after that any binding without a type will naturally mean “any type” (“auto”).
Neither
any
nor the polymorphic types can really take advantage of using “auto” style matching because they are an open set - the type can be anything and will be essentially impossible to implement “all types that we can cast to”.
In case of ambiguity with deconstructing non-variant types, we can always prefix the deconstruction with the containing type:
using Variant = std::variant<int, std::string>;
struct Circle { int radius; };
switch [...] { //< can either be Variant or Circle
// [whatever=] => // Error, ambiguity what the deconstruction means
// [r=] => //
Variant[whatever=] => // OK, variant, `whatever` is either int or std::string
Circle[r=] => // OK, Circle, r is the radius member
}
At this point it is perhaps evident why we might still need the ==
when comparing values. There are places where we must specify, we are comparing to a value, not matching to a type name:
struct Size{ int width; int height; };
Size size = {12, 12};
switch [...] {
// size[w=, h=] => // Error, no type named size
// Size size[w=, h=] => // Same, also an error (attempt for two type matches one after another)
==size[w=, h=] => // OK, match ::size
};
In other words, omitting the ==
is not allowed if the pattern is not the last one, likewise the type match must not be the last pattern:
struct Size { int width; int height; };
Size size = {12, 12};
struct Rectangle { int x; int y; Size size; };
Rectangle rect = ...;
switch [rect] {
// [__, __, size __] => // Error, no type named size
[__, __, ==size __] => // OK, explicit value match
[__, __, size] => // OK, last pattern, match value
// [__, __, Size] => // Error, no such value (type match must not be last)
[__, __, Size __] => // OK, not last pattern, match type
};
This is the place to note, matching to a concept will unavoidably be very different from other type matching as it will introduce template code in disguise:
template<class T>
concept Number = ...;
std::variant<int, float, std::string> val = ...;
switch [val] {
[Number num=] => // num is either int or float
};
In the above example num
can only be used by a template
code (a function or lambda), or if using case :
, within constexpr if
blocks as well. This rises considerable hardship - the code after =>
, for that particular match alone, must have multiple different instantiations, one for each type. Because of this obvious complication, and the desire to start small, matching on concept is prosed as optional for the initial release.
Lastly, matching variants by index (or enum
) is not proposed at this stage. This type of matching falls in the same category as deconstruction using designators, the index being a kind of designator. We don’t have language support for accessing members by index (struct.0
instead of struct.name
), which will make any syntax used in PM appear arbitrary and foreign to regular code. Proposed is to introduce language support for index-based access to members together with destructuring by index (with the same syntax) at a later date.
if
Guardif
guard is proposed as identical to the suggested Pattern Guard from the main proposal (p1371r3):
Result type
Proposed is the same behavior as in the main proposal with either deduced or explicit (switch [...] -> Type
) type. The only difference is that instead of the special !{...}
sections, proposed is the use of the existing case :
sections. They act the same way as in the old switch
and do not contribute to the result type of the PM expression.
Exceptions
Pattern Matching should not throw exceptions with failure being just a no-match case. If operator==
for example throws for some reason, there are no further requirements for the PM to avoid that exception propagating, but PM will not introduce new exceptions or internally use explicitly throwing APIs like std::get
.
Exhaustiveness
There is no proposed enforcement for Exhaustiveness. If no pattern matches, no expressions or statements are executed and:
void
, it is value-initialized and the execution of the enclosing function continues.void
, nothing happens, execution of the enclosing function continues.The above model is as if a normal C++ function, or an immediately invoked lambda, has an implicit return {};
statement at the end. Another way of looking at this is as if the switch
had one last __ => {};
section. This makes uses of PM expressions for initialization simple and concise as only the matched case needs to be stated. Because of this, proposed (optional) is to have surrounding curly braces be optional if there is just one section and no explicit result type:
val is an int, initialized to 0.
The user is free to add his/her own catch-all __
sections as either expressions (=>
), which must satisfy the result, or compound statements (case __:
/ default
). In the latter case, unless the PM result is void
, the statement must return
or throw
else std::terminate
will be called. This ensures the result is always initialized if the execution continues past the PM expression. A switch
without a =>
section and no explicit result has result type of void
(old-style switch
).
user-provided failure
void result
std::terminate called
Usefulness
There is no proposed enforcement for Usefulness. Compilers are free to issue a warning for patterns that will never execute.
std::variant<std::string, int, std::tuple<int, int>> v = ...
switch [v] {
[int x=] => true;
[int x=, __] => true;
[std::string __] => false;
[__] => false; // this can never be reached (single element either int or string, both handled)
[__, __] => false; // this can never be reached (duo elements always int and int)
};
The last two cases could get a compiler warning as they will never be reached.
class Animal { public: virtual void speak() const = 0; };
class Cat : public Animal {
public:
virtual void speak() const override { std::cout << "Meow from " << name << std::endl; }
std::string name;
};
class Crow : public Animal {
public:
virtual void speak() const override { std::cout << "Caaaw!" << std::endl; }
};
const char* fluffy = "Fluffy";
This Proposal
Main Proposal
This Proposal
Op parseOp(Parser& parser) {
return switch [parser.consumeToken()] {
'+' => Op::Add;
'-' => Op::Sub;
'*' => Op::Mul;
'/' => Op::Div;
case token=: {
std::cerr << "Unexpected: " << token;
std::terminate();
}
};
}
case
reuse
struct Expr;
struct Neg { std::shared_ptr<Expr> expr; };
struct Add { std::shared_ptr<Expr> lhs, rhs; };
struct Mul { std::shared_ptr<Expr> lhs, rhs; };
struct Expr : std::variant<int, Neg, Add, Mul> {
using variant::variant;
};
This Proposal
int eval(const Expr& expr) {
return switch [expr] {
[int i=] => i;
[Neg [?e=]] => -eval(e);
[Add [?l=, ?r=]] => eval(l) + eval(r);
// Optimize multiplication by 0.
[Mul [?[int 0], __]] => 0;
[Mul [__, ?[int 0]]] => 0;
[Mul [?l=, ?r=]] => eval(l) * eval(r);
};
}
(extra deconstruction because of variant)
Main Proposal
As Is Proposal
enum Color { Red, Black };
template <typename T>
struct Node {
void balance();
Color color;
shared_ptr<Node> lhs;
T value;
shared_ptr<Node> rhs;
};
This Proposal
template <typename T>
void Node<T>::balance() {
*this = switch [*this] {
// left-left case
[Black, ?[Red, ?[Red, a=, x=, b], y=, c=], z=, d=] => ...
// left-right case
[Black, ?[Red, a=, x=, ?[Red, b=, y=, c=]], z=, d=] => ...
// right-left case
[Black, a=, x=, ?[Red=, ?[Red=, b=, y=, c=], z=, d=]] => ...
// right-right case
[Black, a=, x=, ?[Red, b=, y=, ?[Red, c=, z=, d=]]] => ...
// do nothing
self= => self;
};
}
Main Proposal
template <typename T>
void Node<T>::balance() {
*this = inspect (*this) {
// left-left case
[case Black, (*?) [case Red, (*?) [case Red, a, x, b], y, c], z, d] => ...
// left-right case
[case Black, (*?) [case Red, a, x, (*?) [case Red, b, y, c]], z, d] => ...
// right-left case
[case Black, a, x, (*?) [case Red, (*?) [case Red, b, y, c], z, d]] => ...
// right-right case
[case Black, a, x, (*?) [case Red, b, y, (*?) [case Red, c, z, d]]] => ...
// do nothing
self => self;
};
}
Pattern Matching: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1371r3.pdf↩︎
Pattern matching using is and as: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2392r1.pdf↩︎
Pattern Matching for C++: https://www.stroustrup.com/pattern-matching-November-2014.pdf↩︎
Pattern Matching and Language Variants: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0095r1.html↩︎
std::variant: https://en.cppreference.com/w/cpp/utility/variant↩︎