Narrow version of abstract art for this blogpost

Pattern matching in Rust and other imperative languages

Published on Thu, Mar 18, 2021 by doma team, working from Bristol, UK.

TL;DR

  • Rust is an imperative language that has the most pattern-related language facilities
    • Has both shallow destructuring and deep destructuring
    • if let matching form can be utilised to alleviate Rust's lack multiple-head functions
  • Python has the weakest support for pattern-related facilities but it's about to change
    • Language support for pattern-matching included in alpha (Edit thanks to reddit)).
    • Has packing/unpacking operators
  • JavaScript has a lot of pattern-related language features
    • Position-based destructuring for arrays and key-based for objects
    • Rest parameters, supporting destructuring
    • Shallow-copy spread operator
    • With support from Microsoft, Facebook and NPM, proper pattern-matching in JS is inevitable
  • C++ has powerful libraries for pattern matching. Language support is likely in C++23

All the time, ideas and approaches sift into the world of conventional programming languages world from the programming language theory research and functional programming world. Even Excel has lambdas now!

In this post, we shall cover pattern matching in various imperative programming languages. We shall help you adopt pattern matching techniques to boost the expressiveness and conciseness of your code.

Example of pattern matching techniques being more optimal programming method from the latest C++ pattern matching proposal An example from a C++ evolution proposal.

Pattern matching in Rust

Rust has the most advanced and well-designed pattern system among all imperative languages. Part of it, of course, can be attributed to the fact that developers of Rust had the luxury of building a language from the ground up. But most significantly, it stems from the rigour and culture of design and development.

Pattern matching facilities in Rust language are almost as rich as in its older functional brother Haskell. To learn about them along with us, first, consider the following task (inspired by a real-life use-case):

Explore a non-strictly-structured JSON object where keys are species and values are sets of animals of these species.

If an animal's coat is fur or feathers, it's Cute, otherwise it's Weird. If a species is "aye-aye", it's Endangered. There may be new criteria discovered later on that change categorisation of a particular animal or species.

Categorise animals with distinct names found in the given data set!

So let's start with encoding the categories:

#[derive(Hash, Debug, PartialEq, Eq, PartialOrd, Ord)] /* A */
pub enum Category {
  Cute,
  Weird,
  Endangered,
}

(A) makes sure that Rust will order values from top to bottom, so that Cute < Weird < Endangered. This ordering will be important later on.

Now to encode the rules from the task. Since our JSON is unstructured, we can't rely on any property existing, so we can't safely unwrap or reliably coerce JSON to some data Rust data structure:

fn cat_species(v: &str) -> Category {
  match v {
    "aye-aye" => Category::Endangered, /* A */
    _ => Category::Cute, /* B */
  }
}

Our first match! How exciting! This match is equivalent to switching over contents of variable v, of course. However, it offers more flexibility later on. With the power of destructuring, we can match complex structures, not just single variables.

(A) shows how to match a literal value, (B) shows the "catch-all" clause. This pattern match reads species named "aye-aye" is endangered, other species are cute.

Now let's have a look at how to write something more interesting:

fn cat_animal_first_attempt(v: &Value) -> Category {
  match v["coat"].as_str() {
    Some("fur") | Some("feathers") => Category::Cute,
    _ => Category::Weird,
  }
}

The rule of cuteness is satisfied, no unwrapping used. There are also no explicit checks if the value has Some contents or it has None! This listing confidently states: animals with a fur coat or with a feather coat are cute, others are weird.

But is this implementation good enough? One can check by considering a rule getting added, just as requirements warned us:

Animals that have the albino mutation are Endangered. Otherwise, previous rules apply.

fn cat_animal_first_attempt_1(v: &Value) -> Category {
  let cat = match v["coat"].as_str() { /* A */
    Some("fur") | Some("feathers") => Category::Cute, /* B */
    _ => Category::Weird,
  }
  match v["mutation"].as_str() {
    Some("albino") => Category::Endangered,
    _ => cat
  }
}

The snippet became bulky and boilerplate-y... We now have to thread some variable like in (A). We have to remember not to short-circuit computation in (B) by adding a return by accident. In case an additional rule pops up, we will need to decide between mutable cat or versioned.

So is this it? Pattern matching collapses the moment we need to capture some heterogeneous set of matches? Not quite. Let us introduce if let statement, made just for this sort of challenge:

fn cat_animal(v: &Value) -> Category {
  if let Some("albino") = v["mutation"].as_str() {
    Category::Endangered
  } else if let Some("fur")
              | Some("feathers")
              = v["coat"].as_str() {
    Category::Cute
  } else {
    Category::Weird
  }
}

Now that's more like it. But wait, what does it mean? As with other pattern matches, left hand side is a pattern (for instance, Some("albino")) and right hand side is value (for instance, v["mutation"].as_str()). A branch under if will get executed when and only when the LHS pattern shall match the RHS value.

Pattern matching with if let syntax makes us start with the most specific clause and fall through to less specific clauses in an unambiguous order, taking away excessive liberty and thus making the code less error-prone.

Putting it all together

pub fn categorise(
  data: HashMap<String, Vec<Value>>,
) -> HashMap<Category, Vec<String>> {
  let mut retval = HashMap::new();
  for (species, animals) in data {
    for animal in animals {
    
      if let Some(name) = (animal["name"].as_str()) { /* A */
        retval
          .entry(max(cat_species(species.as_str()),
                     cat_animal(&animal))) /* B */
          .or_insert(Vec::new()) /* C */
          .push(name.to_string())
      }
      
    }
  }
  retval
}

Now that we have categorisation functions, we can proceed to categorise our data set. If (A) if let match fails (current animal has no name supplied), we'll move to the next iteration. Not all the patterns have to have the catch-all arm.

Otherwise, the variable name will store the current animal's name and we will chain some functions from a handy HashMap API. In (B) we use the Ord instance of Category enum to determine the highest priority category between species-based categorisation and per-animal categorisation with std::cmp::max function.

Then HashMap's entry returns the reference to the value under the category. If there is None, or_insert in (C) inserts an empty vector and returns a reference to it. Finally, we can push the name of the current animal to this vector, and it will appear in our mapping!

We hope that this guide provides a reasonable introduction to pattern matching in Rust. See the full code of the example module on sourcehut.

Let's finish the post with some information about pattern-related features of other popular imperative languages.

Patterns in modern JavaScript

const foldAndDump = (path, xs, ...cutoffs) => {
  // snip
  for (c of cutoffs) {
    //snap
  }
}

An old feature of ECMAScript, the JS standard called "rest parameters" ...cutoffs will match arguments of a function beyond the second into an array called cutoffs.

var rs = [];
for (let [printing, info] of
     Object.entries(allPrintingsJson['data']))
{
    rs.push({ ...info, "_pv_set": printing });
}

When the ellipsis isn't in the argument list, it means that we're dealing with a newer feature called "spread syntax". ...info means "include info object as is". Analogously, spread syntax can spread an enumerable object across arguments of a function call:

const xs = [1,2,3];
console.log(sum(...xs));

Finally, there is unpacking, which is a pretty standard feature by now:

> [a,b] = [1,2]
[ 1, 2 ]
> {x,y} = {y: a, x: b}
{ y: 1, x: 2 }
> {k,l} = {y: a, x: b}
{ y: 1, x: 2 }
> [a,b,x,y,k,l]
[ 1, 2, 2, 1, undefined, undefined ]

Packing and unpacking in Python

In modern Python, any iterable is unpackable:

>>> a, *b, c = {'hello': 'world', 4: 2, 'rest': True, False: False}
>>> a, b, c
('hello', [4, 'rest'], False)

* is analogous to JS's ellipsis (...) operator. It can collect some "the rest of the values", but it can also work as a spread for iterables:

>>> print(*[1, 2, 3])
1 2 3

Conversely, in spirit of Python, there's a special case operator called "dictionary unpacking operator". It works very similar to spread operator:

>>> print({'x': True, **{'y': False}, **{'x': False, 'z': True}})
{'x': False, 'y': False, 'z': True}

Rightmost spread precedes.

Pack your bags: we're going pattern matching

Every single language that is in active development is looking to adopt more and more features from functional languages, and pattern matching is no difference.

We'll conclude this post with a list of languages that will adopt proper pattern matching, ranked by degree of our certainty in adoption.

Pattern matching in Python

  • There were different proposals throughout the history of Python, but PEP 634 got implemented
  • Alpha version of Python with "structural pattern matching" is available since March 1st

Pattern matching in C++

Pattern matching in JavaScript

  • Tied for the first place in "the most likely to adopt proper pattern matching", JavaScript's standard called "ECMAScript", has this proposal backed by Microsoft, Facebook and NPM.
  • The proposal is thoroughly reviewed and was moved to "stage 1", which puts the theoretical release of this feature in the 2023-2025 range.
  • You can check our maths by inspecting git logs in completed proposals repository.

The idea of pattern matching is to have a code execution branch based on patterns, instead of conditions. Instead of trying to encode properties of values necessary for a code branch to get executed, programmers who use pattern-matching encode how should values look like for it to happen. Thus, in imperative languages, pattern matching promises more expressive and declarative code compared to predicate statements such as if and case, bar some corner cases.

It might be a subtle difference, but once you get it, you add a very powerful way of expression to your arsenal.

We find that understanding these concepts is akin to the understanding of declarative vs imperative programming paradigms. To those interested in the philosophy of the matter, we suggest finding a cosy evening to curl up with a cup of steaming drink and watch Kevlin Henney's "declarative thinking, declarative practice" talk:

Kevlin Henney: Declarative Thinking, Declarative Practice. ACCU 2016. Non-tracking YouTube embed.