Document #: | xxx |
Date: | 2022-02-24 |
Project: | Programming Language C++ SG17 |
Reply-to: |
Mihail Naydenov <mihailnajdenov@gmail.com> |
This paper investigates the challenges we face with identifiers inside Pattern Matching (PM). It also proposes a method for their introduction, one that is inline with current practices, without new language syntax.
One of the many challenges in PM has to do with identifiers. There is an unavoidable ambiguity in what we mean by a variable named x
: Is it a newly introduced entity or a reference to an existing one? There is no right answer and each system decides on its own. For example in Python, x
is a new name and if the user whats to access an existing one it must contain a dot (essentially to be a path) - something.x
. In Rust also x
is a new name. Swift and C# introduce new names the same way this is done outside PM - with let
and var
respectively.
Variable introduction style might seem like a minor, “esthetics” issue, but is really not, especially in C++, for two important reasons:
class s{}; s s;
.These two just add up to the preexisting ambiguities inside PM, creating an even more complicated context.
Let’s examine the number of possibilities for what an identifier could mean in the C++ pattern matching scenario. We mentioned some of them already:
x
can be a PM variable, a bindingx
can be a non-PM variable, one to compare againstx
can be a typeBefore continuing, here are our current solutions for the above 3.
The main PM proposal, p13711, as of its latest iteration at the time of writing:
x
.case
- case x
.<x>
.Alternative Pattern in p1371 also accepts a constant-expression, it is not clear if this means a constant variable as well. In that case we could have an ambiguity whatever
<x>
means atypename x
or a variablex
, used to select the alternative.
However, according to p26882, the above will be revisited in future versions and will be as follows:
let
keyword.The “is as” proposal (p2392)3 is as follows:
is
or as
.is
.is size
- do we mean a type or a with the same name?As you can see, things are already not easy with these 3 cases, but there is more:
- x
can be an existing, non-PM variable, used for binding, an assignment.
Arguably, this last case mainly concerns patterns outside PM. For example, we can imagine extending structured bindings to bind to existing variables:
Although, there is std::tie
, it’s a hidden, non-obvious way of doing things. More importantly, it is not composable with any patterns:
some_t a,b;
// some-pattern below is a hypothetical pattern-used-outside-PM.
some-pattern std::tie(some-pattern a, b) = something; //< can't bind to preexisting variables from within patterns
Outside PM we will certainly need to bind to existing variables, but binding to existing variables inside PM could also be beneficial. We might want to fill a preexisting data structure directly. Something along the lines of:
void useSize(size_t& size);
size_t useAndReturnSize() {
size_t size{};
inspect(some_rect) {
[__, let sz = {5,5}] => useSize(size = sz);
...
}
...
return size;
};
Here, ideally we would have just used the size
variable directly. sz
is just a crutch to carry us around the fact, we can’t bind directly to preexisting variables. Granted, this use is not that important exactly because the workaround is easy and as such, current proposals do not address this usage.
There is however, much more important case, missing from the discussions so far.
We talk about introducing a variable for PM, a binding, but this binding should have control of its mutability, much like any other variable. We should have both a binding for reading and a binding for writing.
That scenario is not handled directly by the current proposals and a simplified approach is taken, where all binding are mutable, unless the object we “inspect” is const. In that case all binding are const by the regular constness preservation rules. Needless to say, this is far from perfect for two reasons:
switch
and if-else
is recursive, which results in a lot of complexity, with many branches in an overall three structure. We will absolutely encounter the need to have almost all branches be const, but some be mutable simply because not all paths will have the same requirements and usage. An “all or nothing” approach will be suboptimal, hurting const correctness overall - if you want that one variable, in that one arm to be mutable, all of them will have to be, the root can no longer be const.As you can see, there are plenty of complications regarding identifiers in the PM context with many potential uses and very limited ways to express them. Here is the complete list:
x
can be a PM variable, used for observationx
can be a PM variable, used for modificationx
can be a non-PM variable, one to compare againstx
can be a non-PM variable, one to write tox
can be a typeOf course, not all uses are equally important, but all must be taken into account as part of the overall requirements and view into “the bigger picture”. Otherwise we will write ourselves into a corner, as the syntax options are extremely limited.
Identifiers for type selector need a special mention. If we step back and consider what syntax we would like to see as a natural way of doing type switching, we should look no further then the very first proposals for Pattern Matching like pattern-matching-November-20144 or the early p0095.5 They both advertise something like:
to select (and deconstruct) a type Some
from a polymorphic source value. This makes sense - deconstructing is the opposite of constructing and often programming languages use a syntax, very similar to either class definition or construction:
Rust class definition
C# class definition
Python class construction
Scala class definition
In other words, our initial intention to have deconstruction mirroring construction was more or less “right”. Something however happen in the process and we drifted away of it. Using square instead of curly brackets for deconstruction is understandable, considering Structured Bindings are already in the language. However, we also lost the natural way of expressing the desired type:
The as
, although looking nice, is foreign to C++. To change this fact, we must alter the language, adding as
outside PM also. On the other hand, angle brackets can hardly be perceived as pretty or natural and, considering the recursive nature of PM, will bring considerable visual noise.
This is an overview, including future directions. For the concretely proposed items see next section.
In the previous section we made the observation, an identifier can be one of those:
It is clear, it is not possible to have a simple, single identifier without encountering multiple ambiguities. To account for that, we must “mark” identifiers for their specific use. In the main proposal this is done by either using case
for “existing variable to compare against” (p1371) or with let
(p2688) to mark new identifiers. Angle brackets are used around type names. In the “is as” proposal, the location of the identifier in relation to is
or as
is the determining factor.
Regarding the latter, we can make the observation, is
is quite similar to an operator, we already have, the operator==
. This means, marking values for comparison with an ==
can be a reasonable approach as well, one that does not need introducing new language features:
int value = 5;
int number = ...;
inspect (number) {
==value => //< matches `value`
// or !=value if we want!
...
};
This looks promising, the code is self-explanatory, free from ambiguity, though arguably a bit verbose. Later, in the proposal part, this issue is addressed as well.
Please note,
inspect
, and much of the syntax, is just a placeholder, reusing it from the main PM proposal (p1371). The authors seem to be moving away from it in p2688, but it can still serve us here.
Also, having==
does not exclude intrudingis
in the future.is
will be just another operator, both in and outside PM.
One very interesting side effect is that by using ==
(or possibly !=
), we can also naturally introduce the other comparison operators:
inspect (number) {
>=value => //< greater or equal
<=value => //< less or equal
< value => //< less then
> value => //< greater then
...
};
This is a really nice bonus feature, already available in C# PM for example.
Moving forward. We saw how we can introduce existing identifiers for comparison naturally, now let’s get to the much harder cases - introducing new variables. In C++, much like C before that, there is no special keyword or any other syntactical marking. Identifiers for variables are introduced by a preceding identifier of a type:
id_of_var
is new variable, indicated only by the fact, it has a preceding type id_of_type
.
This is a though place to be as this kind of introduction is completely unusable for PM.
Facing that hard case, current proposals introduce new language. This is either special mini language, in the main PM proposal, where having new variables are introduced with the let
keyword. Or, in the “as is” proposal, by having new language feature, the is
and as
operators, which in PM can create new variables among other things. Having to introduce a new language is high price to pay and it leads to inconsistencies - in and outside PM the variables are introduced differently. Faced with that new issue, we must either address it by changing the rest of the language (the “as is” proposal) or ignored it (main proposal).
This is not the case with languages, which already have a keyword for variable introduction! Swift using let
inside PM is fundamentally different then C++ doing it - let
is already the way to introduce variables. Same for C# and its var
keyword.
This negative scenario is not the only one possible however.
Interestingly, having to precede the identifier with a type has one exception - the lambda capture. There we don’t have to supply the type before a new variable:
[a=something] { // New variable `a` is introduced for the lambda with the same value as `something`.
...
}
This is an interesting corner case. Is used to make the syntax concise and it works only because there are no uninitialized captures.
Coincidentally, both of those are true for the PM as well - we need a concise syntax and there are no uninitialized bindings!
Reusing the syntax seems like a very sensible option:
int number = ...;
inspect (number) {
n=__ => // new binding `n` is introduced for the PM with same value as the matched ::number
// or just n= omitting the __
...
};
Similarly how we create a lambda-local variable, always initialized to some preexisting value, we can create a PM-local variable, again always initialized to a preexisting value. This time however the value is coming implicitly from the object we “inspect”. In the first, lambda case, we assign manually, in the second, the PM system will assign for us.
In the above example, we might continue matching, adding an expected value:
int number = ...;
inspect (number) {
n=5 => // new binding `n` is introduced for the PM, iff ::number is 5
...
};
Here we create a binding n
, equal to 5
. Of course, it is not we that “assign” 5
to the binding, in contrast to lambda. It is the PM system that will create the binding, with a value, equal to the number
variable, but only if number
is 5
. The end result is the same and what is written will not be misleading - n
is (equal to) 5
.
Moving to lambda-like identifiers introduction has a major benefit - we have syntax ready for all other identifiers!
Mutable and immutable bindings:
int number = ...;
void mutate(int&){...}
inspect (number) {
n=__ => mutate(n); // FAILS, n is immutable by default!
&n=__ => mutate(n); // SUCCEEDS, n is requested to be mutable
...
};
Binding to preexisting variables:
int number = ...;
int n;
inspect (number) {
&n __ => mutate(n); // binding to existing variable with the same value as ::number
// or just &n omitting the __
...
}
Compared to lambda side by side, similaraties are there:
:::{.columns} :::{.column width=“50%”}
lambda capture
::: :::{.column width=“50%”}
PM bindings
int x,y,z;
inspect (...) {
x= ... => // new x
&y= ... => // new y, but reference
&z ... => // existing z reference
};
::: :::
Arguably, assignment to existing variable could look confusing when combined w/ other patterns. For example &z 5
will be the syntax to assign to a preexisting z
if value is 5
. We will have assignment w/o =
, which is far from great. However, this use-case is rare, it is not even proposed in currently active proposals. As such, this paper is willing to accept this downside for the other benefits this approach brings.
And the benefits don’t stop there. One additional significant gain is the fact, we can name types directly!
Because all identifiers so far were “marked” by additional tokens, we could finally have “unmarked” names for types:
Having natural type selection is a fundamental benefit, resurrecting the “dream syntax” from a decade ago, one w/o special tokens like <>
or additional language features like as
.
Putting it all together, we have a complete set with zero ambiguity:
struct size{...};
size size;
inspect (...) {
size => // match the ::size type
size= => // new `size` binding, hiding ::size
&size= => // new `size` binding, hiding ::size, mutable
&size => // assign to the ::size variable
==size => // match sizes, same as ::size
<=size => // match sizes, less-then-or-equal to ::size
>=size => // match sizes, greater-then-or-equal to ::size
> size => // match sizes, greater then ::size
< size => // match sizes, less then ::size
...
};
The above study is not proposed in its entirety or detail. In particular, binding to existing variables, as well as the greater-then and less-then comparisons are not proposed. This proposal considers these to be future direction or at least future proofing for not writing ourselves into a corner.
With that out of the way, proposed identifiers are:
=
- immutable binding&
id=
- mutable binding==
id - variable for equality comparisonProposed is also a slightly different, more user-friendly version of the above specification.
In particular, having to compare to a value using ==
can get verbose and might seem atypical. Similarly, but to a lesser degree, having single identifier to mean a type name might be confusing:
struct size{...};
size size;
inspect (...){
size => //< There is a certain expectation, this is a value, not a type, we compare against!
[size, size] => //< Here as well
};
To account for that, proposed is to have the ==
be optional if it is the last pattern, which is often the case. To avoid ambiguity, the type name must never be the last pattern. In practice this means, to match on type, one must follow up it’s name with another pattern, like __
:
struct size{...};
size size;
inspect (...){
size __ => //< Size is definitely a type
size => //< Value, "as expected"
[size, size] => //< Values here as well*
[==size, ==size] => //< Same as above, more verbose, but more clear?
size size => //< also valid, type is size, value equals the ::size variable
size (==size) => //< same as above, but more clear
};
// *assuming some hypothetical value, deconstructed into 2 sizes
There are cases however, where ==
is still required:
struct Size{...};
Size size;
inspect (...){
// size[w=, h=] => // Error, no type named size
// Size size[w=, h=] => // Same, also an error (attempt for two type matches one after another)
==size[w=, h=] => // OK, match ::size
};
Omitting the ==
here is not allowed because the pattern is not the last one. We continue to deconstruct, binding the width and height to w
and h
respectively. In that case an attempt is made to match on a type, named size
, not a value. To request match by value instead, ==
must be is used.
Likewise the type match must not be the last pattern:
struct Size{...};
Size size;
struct Rectangle { int x; int y; Size size; };
Rectangle rect = ...;
inspect (rect) {
// [__, __, size __] => // Error, no type named size
[__, __, ==size __] => // OK, explicit value match
[__, __, size] => // OK, last pattern, match value
// [__, __, Size] => // Error, no such value (type match must not be last)
[__, __, Size __] => // OK, not last pattern, match type
[__, __, Size [w=, h=]] => // OK, not last pattern, match type
};
Advanced example
int x = 5;
inspect (...){
x => // match ::x
==x => // same as above, but explicit
x= => // new binding, hiding ::x
x=x => // new binding, match value ::x
x=(==x) => // same as above, but explicit
int x => // match int, match value ::x
px= (int x=x) => // bind px to a polymorphic type, then match the type to int, match value to ::x, bind result to a new `x`
int &x=x => // match int, match value x, bind result to a new `x` as mutable
...
};
Presented here was an investigation of identifiers usage inside PM, with the hope all limitations and use-cases are accounted for, at least in principal.
Along that, a system of identifiers is presented, which accounts for all use cases, does not require new language syntax and is based on existing practices. A small, most useful subset is proposed as a way to handle identifiers in a practical PM implementation.
Last but not least, suggested is that, by using the described system, it will be possible to have type matching in a natural way by just naming the type - int x =>
. This is the preferred way in most PM frameworks, both in C++ (all early proposals) and other languages.
This Proposal
struct Point {...};
Point p;
int x = 10;
int y = 20;
inspect (...) {
Point [==x, ==y] => //< Point type, at pos (::x, ::y), explicit
Point [x, y] => //< Point type, at pos (::x, ::y)
Point [&x=, &y=] => //< Point type, bind x, y for writing
Point [x=, y] => //< Point type, at ::y pos, bind x pos to `x`
Point [x, y=] => //< Point type, at ::x pos, bind x pos to `y`
Point [x=x, y=y] => //< Point type, at pos (::x, ::y), bind x, y
};
Main Proposal
class Animal { public: virtual void speak() const = 0; };
class Cat : public Animal {
public:
virtual void speak() const override { std::cout << "Meow from " << name << std::endl; }
std::string name;
};
class Crow : public Animal {
public:
virtual void speak() const override { std::cout << "Caaaw!" << std::endl; }
};
const char* fluffy = "Fluffy";
This Proposal
Main Proposal
The following example extremity stems for the fact, both value matching and binding are assumed to be commutative. If this is not the case, then the patterns must appear in some predefined order. Composability is out of scope for this paper
struct Size{...};
Size size;
inspect (...){
Size size [w=, h=] => //< type is ::Size, value equals ::size, w and h are introduced
Size [w=, h=] size => //< same as above
Size sz= [w=, h=] size => //< same as above, but we also bind the size value to `sz`
Size size [w=, h=] sz= => //< same as above
Size sz= [w=, h=] ==size => //< same as above (more clear)
sz= [w=, h=] ==size => //< same as above, but valid for any type, not just ::Size
==size [w=, h=] sz= => //< same as above
};
Pattern Matching: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1371r3.pdf↩︎
Pattern Matching Discussion for Kona 2022: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2688r0.pdf↩︎
Pattern matching using is and as: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2392r1.pdf↩︎
Pattern Matching for C++: https://www.stroustrup.com/pattern-matching-November-2014.pdf↩︎
Pattern Matching and Language Variants: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0095r1.html↩︎