Pattern Matching In Clojure

Updated: Note that this is available as a clojars module.

Clojure code density seems to be pretty good. There are a fair number of convenient shortforms in the language; for example, associative datatypes all act as a function — so given a hash map you can reference it with (my-hashmap :key). The base language itself is probably about as expressive as Python (or a bit better), but you have the added advantage of being able to use macros as needed to really get the code density up.

Nonetheless, I really wanted something like Ocaml’s / Haskell’s pattern matching; it makes some code wonderfully concise.

Accordingly, I hacked something up, based on Clojure’s built-in destructuring. Some examples:

Literal values match against the same value, while _ matches against any non-nil value (and nil matches against any nil one). Additionally, :when clauses can be used for conditional checks.

; simple recursive evaluator
(defn arithmetic [lst]
(match lst
v  :when (number? v)  v
[ _ "error" _]     "error"
[ _ _ "error"]     "error"
[ "add" a b ]      (+ (arithmetic a) (arithmetic b))
[ "sub" a b ]      (- (arithmetic a) (arithmetic b))
[ "mul" a b ]      (* (arithmetic a) (arithmetic b))
[ "div" a b ]      (/ (arithmetic a) (arithmetic b))
[ "squared" a ]    (arithmetic ["mul" (arithmetic a) (arithmetic a)])
_                  "error" ))

Both collections and single values can be used:

;; return signum fr a number
(defn signum [x]
(match x
0 0
n :when (< n 0) -1
_ 1))

The pattern matching is stricter than the typical destructure; whereas [ a b ] will destructure against a list of any number of elements, [ a b ] will pattern match only against a list of two elements.

(match x
[]    "empty"
[_]   "one element"
[a a] "two identical elements"
[_ _] "two elements"
_     "three or more")

If the same variable occurs in multiple locations in the parameter list, it will be checked for equality. The & tail form can be used to specify the rest of the list.

;; count identical elements in the same location in two lists:
(defn count= [ lst1 lst2 ]
(loop [ a lst1 b lst2 count 0 ]
(match [a b]
[[e & at] [e & bt]]  (recur at bt (inc count))
[[_ & at] [_ & bt]]  (recur at bt count)
_                    count)))

Note that this is slightly more flexible than Haskell / ML, in that a variable of the same name can be multiple places in the pattern.

Defining

You can use the defnp macro to define a function that is pattern matched; it defines a function that takes one argument and has an implicit match statement. For example, the signum function can be written:

    (defnp signum
0 0
n :when (< n 0) -1
_ 1)

(Thanks to Tom Faulhaber for suggesting this)

Gotchas

The Clojure destructuring will cause an exception if you try to destructure a collection type with a value.

(let [[a b] 10] a)
java.lang.UnsupportedOperationException: nth not supported on this type: Integer (NO_SOURCE_FILE:0)

… so be sure to check such cases early in your match statement, if they are possible.

How It Works

The pattern matcher uses the built-in Clojure destructuring as the main mechanism, but adorns it so that the pattern can be verified. For example, the code:

(match x [a a] "two identical" )

turns into essentially the following:

(let [ [ a g0001 & g0002 ] x ]
(if (and (not (nil? a)) (= g0001 a) (nil? g0002)) "two identical" nil))

That is, the destructuring is done, but then the two variables are checked to make sure that they are equal, and the list is checked to make sure it is only two elements long.

Discussion

Comments are moderated whenever I remember that I have a blog.

Tom Faulhaber | August 13, 2009
Wow, this is cool! I've been meaning to write this myself for a while but never got to it. Too small ideas/suggestions: 1) When writing Clojure, Rich wanted to reduce the number of parens (relative to Common Lisp), so (let ((var1 val1) (var2 val2)) ...) became (let [var1 val1 var2 val2] ...) and (cond ((pred1 action1) (pred2 action2))) became (cond (pred1 action1 pred2 action2)) in a similar vein, it would seem to make sense for (match var (pat1 action1) (pat2 action2)) to be simplified to (match var pat1 action1 pat2 action2) though you'll need to figure out exactly how best to express :when syntax. 2) This just screams out for a defnp like clojure.contrib.defnk: (defnp signum 0 0 n :when (< n 0) -1 _ 1)
tim | August 13, 2009
@Tom: 1. Yah, I went back and forth on this too. I eventually went with (pattern action) because a) it matches variable-arity function definitions and b) it allows for an implicit do-body for the actions (which I just realized doesn't work right now, but I'll fix that tonight). That said, I'm agnostic on the issue, and could be convinced one way or the other -- would reducing the parens in the default case make up for having to have an explicit (do ...) for the more complicated actions? 2. Oh, that's a great idea -- I'll take a look at that.
Tom Faulhaber | August 13, 2009
@Tim: On 1, I think that my suggestion has the advantage of consistency with other Clojure forms. It seems that everywhere in Clojure that Rich had the choice he followed the "single-layer model" rather than the "double-layer model" (which is usually the choice in CL). condp is another example here. It would be nice if the match form (and the defnp form) felt as clojure-y as possible. The loss of the implicit do would be a small price to pay for that. Tom
fogus | August 13, 2009
Excellent work. This has been long in coming and I applaud you for the effort. I know what I'll be playing with this afternoon. :) -m
tim | August 13, 2009
@Tom: Yah, I think you're right. I tried it out locally and it felt cleaner and more clojurey (if that's a word). Latest HEAD has the new stripped down syntax. Thanks! @fogus: Thanks very much -- enjoy. :-)
Tom Faulhaber | August 13, 2009
@tim; After thinking about it more, my opinion is fuzzier. :-) But I'd leave it with the stripped down syntax for now, since you've changed it and see how it feels after some more use. The interesting question here is how often do things end up looking like conds with simple actions and how often do they look like funs with multiple arities. I was thinking the former (partially cause of the examples and some half-formed haskell in my head), but after musing on it, I'm not as sure.