Tail Call Optimization

Thomas Bradford May 17, 2019

Functional programming languages introduce a different set of compilation problems than do imperative languages. They’re simpler in many ways, but when it comes to accommodating the functional style of problem solving, the compiler author has a duty to the programmer that cannot be ignored.

So where my Clojure people at? Fire up the Leiningen REPL, type the following code into it, and then tell me what happens. Don’t worry, I’ll wait. ¹

(defn to-zero [x]
  (cond
    (> x 1000) (to-zero (- x 1))
    (> x 0)    (to-zero (- x 1))
    :else      0))

(to-zero 9999)

If you said it spits out something like the following, then you’d be right:

Execution error (StackOverflowError) at user/to-zero (REPL:3).

So What’s Happening Here?

Clojure generates JVM bytecode and the JVM was originally designed to support an imperative programming language (Java – you might remember it). Imperative programming languages tend to rely on controlled loops to iterate over sets. These boil down to testing a condition, performing some action, and then jumping back to the beginning of the loop to do it all over again.

Functional languages always solve the problem using recursion, but it means that a functional programming language needs to avoid pushing stack frames when the function calls itself. Otherwise the stack would overflow (Clojure just demonstrated this). And so most functional languages support an optimization called “tail call optimization.”

Although it’s a functional language, Clojure does not support tail call optimization. ² Instead what it ends up doing here is pushing a new stack frame for every call, quickly exhausting the JVM’s stack size limit. And then “boom!”

Huh? Wha?

Yeah, so let me explain.

Tail call optimization is performed when the compiler sees that a function is making a function call and the results of that call are immediately being returned. When this happens you’re in a very fortunate position. It means you can generate code that doesn’t actually perform the function call (pushing new stack frames). Instead, you generate code that alters the state of the machine to place itself at the beginning of the called function.

No stack overflows, no call overhead, just the ability to infinitely recurse, the details of which you don’t have to sweat. Pretty cool, huh?

And Why Should I Care?

Ale performs tail call optimization for functions that it compiles. So when you paste the above code into the Ale REPL, you’ll get the answer you’re looking for, which is zero. You can even test it with much larger numbers. Sure, it’ll take longer to produce zero, but it will never explode violently, and that’s good.

Tail call optimization accommodates significantly cleaner code that allows one to simply express an algorithm rather than worry about the internal details of how the language is computing the result. And that is also good.

Caveat Emptor

All of this sounds great, right? But it’s important to know what can and can’t be optimized. To be more specific, what can be optimized are calls to compiled functions that are in the tail position, as previously described. What can’t be optimized is everything else. For example, the following Fibonacci calculator seems like it would be a great candidate for optimization. Let’s try it out.

(define (fib i)
  (cond
    [(= i 0) 0]
    [(= i 1) 1]
    [(= i 2) 1]
    [:else   (+ (fib (- i 2)) (fib (- i 1)))]))

(fib 10000)

You’ll find that attempting to calculate the 10,000^th Fibonacci number with this function may never return. Sure, it might not blow out the stack, but it’s incredibly slow and CPU intensive. That’s because the recursive calls to fib are being further processed before being returned (by the + function). You can solve that problem like this:

(define (fib i)
  (cond
    [(= i 0) 0]
    [(= i 1) 1]
    [(= i 2) 1]
    [:else
    ((lambda-rec fib-inner (max cur p1 p2)
       (if (= max cur)
         (+ p1 p2)
         (fib-inner max (inc cur) p2 (+ p1 p2))))
     i 3 1 1)]))

(fib 10000)

This version of the function passes accumulating state in the form of the p1 and p2 parameters, allowing fib-inner to be in the tail position. It’s uglier than the previous version, but it’s significantly faster and stack-courteous. On my Lenovo T430, it can return the 10,000^th Fibonacci number in about 14 milliseconds – which ain’t bad.

I know this is a very bogus example. Who’d need a function that always counts down to zero? ↩︎
Clojure supports loop/recur, which is an explicit way to perform this particular type of recursion, but it’s hideously ugly, IMHO. Clojure and Ale also support lazy sequences. These do the job, but come with a bit of overhead, and they’re even more gnarly to work with. ↩︎