# From Church Numerals to Y Combinators

## Introduction: How to Write Functions

We'll begin our introduction to the lambda calculus by considering the question of how we might write down a function. There are standard ways to notate numbers, $$\def\.{\kern0pt{.}} 42 \\ 0x2a$$

sets, $$\{0,1,2\} \\ \{x | x > 0, x \in R\}$$

etc, but there doesn't seem to be a way to write down functions. How should we write down a function, for example, that takes a number and triples it?

"Easy!" you say. "Just call that function $f$ and write $$f(x) = 3x$$

to define it."

Well, this approach suffers the drawback that you must give the function a name. I want a way to write down anonymous functions - functions that do not have a name. I want to be able to say $$f = something$$

This is indispensible in writing higher-order functions, which are functions that return other functions. In the lambda calculus, we would write $f$ as $$\lambda x.3x$$

Where, in general, a function that takes an argument and returns something is written as $$\lambda [argument]\.[return\ value]$$

By the way, function application will be denoted simply by writing the function next to its operand, like so $$fx$$

Instead of using parentheses as in $f(x)$ because, as you will see later, there are more important things for parentheses to do.

## Simple Beginnings

Let's now create a universe for our functions to live in. Mathematically inclined readers might now point out that this universe must be incomplete - after all, functions must act on something, so surely we must include something else in the universe for them to act on, for example integers.

To get around this, we shall let functions act on other functions; our functions shall accept other functions as arguments and returns functions. This way, our universe will only contain functions. It might seem at first that there aren't any interesting functions to write down, but let's see how far we can get. First, a function that does absolutely nothing, the identity function: $$I = \lambda t\.t$$

"You got lucky," I hear you say. "Your universe can only contain silly functions like these - I don't think you can come up with many more." Well, how about this? $$C_I = \lambda s\.(\lambda t\.t)$$

No matter what you pass to it, this function $C_I$ will always return the identity function. Here we see clearly why lambda notation is superior; without it, we would have to do this to define $C_I$ $$C_I s = I \text{ for all }s\\ \text{where }It = t \text{ for all t}$$

I hope you got the hint that $C_I$ is so named because it is the constant function of I; we can generalise to a whole class of constant functions of r $$C_r = \lambda s\.r$$

heck, let's create a function that creates these $C_r$'s $$K = \lambda r\.(\lambda s\.r)$$

with the property that $Kr = C_r$. The rabbit hole goes deeper and deeper.

  var I = function(t){ return t };
var C_I = function(s,t) { return t };
var C_r = function(s) {return r};
var K = function(r,s){ return r };


### Exercises

1. Using the interpreter, confirm that K I is equivalent to C_I
2. Write down the function $C_{C_I}$, the constant function of $C_I$
3. Without using the interpreter, evaluate (K K) (K K).

## Multiple Arguments

So far, all our functions have only accepted one argument - what happens if we want them to take two or more arguments? For example, think of the two-argument function $\zeta$ that applies its second argument to the first argument. $$\zeta(x,y) = yx$$

The way to write this is to make the function take only the first argument $x$ and return a partially evaluated function, which then takes the second argument and processes that, then returns the final result. $$\zeta = \lambda x\.(\lambda y\.yx)$$

As you can see, when we feed $\zeta$ the first argument... $$\zeta x = \lambda y\.yx$$ we get a partially applied function. When we feed that the second argument... $$(\zeta x)y = yx$$

We get back the finished computation. This strategy is called currying and should be familiar to Haskell and ML programmers, among others.

## A Note on Notation

You can see why we reserved brackets earlier on; without them, an expression like $KII$ is ambiguous because it could be read as either $(KI)I$ or $K(II)$. \begin{align} (KI)I &=C_I I \\ &= I \\ &= \lambda t\.t \\ K(II) &= KI \\ &= \lambda r\.I \\ &= \lambda r\.(\lambda t\.t) \\ \end{align}

Since the first one is much more common, we will often drop the brackets. Hence our convention is that when omitted, function application starts from the leftmost pair; $KII$ is read as $(KI)I$.

In a similar vein, let's adopt a convention and simplify this $$\zeta = \lambda x\.(\lambda y\.yx)$$

to this $$\zeta = \lambda xy\.yx$$

and say that multiple arguments appear between a $\lambda$ and a dot represent a curried function. Let's have a few more examples to make it clear. We'll introduce the projection operators $\pi_1$ and $\pi_2$, which are functions of two arguments that return either the first or the second argument. $$\pi_1 = \lambda xy\.x \\ \pi_2 = \lambda xy\.y$$

Notice how clearly we can see the definition now. Also notice that $\pi_1 = K$ and $\pi_2 = KI$.

  var zeta = function(x,y){ return y(x) };
var pi_1 = function(x,y){return x};
var pi_2 = function(x,y){return y};


### Exercises

1. Without using the interpreter, evaluate zeta pi_1 pi_2 zeta.
2. Let $\pi_{m/n}$ be the $m^{th}$ projection operator on $n$ variables; in other words, it takes $n$ arguments and returns the $m^{th}$ one. For instance, $\pi_{2/5} = \lambda abcde\. b$. Express $\pi_{3/3}$ in terms of $K$ and $I$.
3. In general, what must $m$ and $n$ satisfy so that $\pi_{m/n}$ is expressible in terms of $K$ and $I$?

## Interactive?

I promised in the subtitle that this would be an interactive tour. Click on the thin grey line above for the interactive part. Inside you will find the console to an interpreter as well as some pre-defined terms. Note that the interpreter considers KI to be one identifier and K I to be $K$ applied to $I$. Use \ to type $\lambda$.

By the way, there's another expandble section after the section Simple Beginnings. I suggest that you start with that one first.

## One, Two, Three...

There are many ways to represent numbers via functions, to say that this function represents one, that one represents two, etc. As an exercise, try thinking of one of them.

We'll focus on a classic encoding devised by Alonzo Church. The Church encoding of a number $n$ is a function that maps functions to their $n$-fold compositions. \begin{align} nf &= f^n \\ &= f \circ f \circ \ldots \circ f \end{align}

To use an example from trigonometry, \begin{align} 3 \sin &= \sin \circ \sin \circ \sin \\ (3 \sin) x &= sin(sin(sin(x))) \end{align}

To derive the explicit form for $n$, we first write $f^n$ explicitly as $\lambda x\. f(f(f\ldots f(x)\ldots))$. \begin{align} nf &= f^n \\ &= \lambda x \.f^n x \\ &= \lambda x \.f(f(f\ldots f(x)\ldots)) \end{align}

abstracting out the $f$, $$n = \lambda fx \.f(f(f\ldots f(x)\ldots))$$

and here are some examples to illustrate. \begin{align} 0 &= \lambda fx\.x \\ 1 &= \lambda fx\.fx \\ 2 &= \lambda fx\.f(fx) \\ 3 &= \lambda fx\.f(f(fx)) \end{align}

Let's derive the successor function that adds one to a number. \begin{align} Sn &= \lambda fx\.f^{(n+1)}x \\ &= \lambda fx\.f(f^nx) \\ &= \lambda fx\.f(nfx) \end{align}

Hence, $$S = \lambda nfx\.f(nfx)$$

This relies on the fact that $f \circ f^n = f^{n+1}$. Using the general form $f^m \circ f^n = f^{m+n}$, we get a function $\Sigma$ that sums two numbers. $$\Sigma = \lambda mnfx.mf(nfx)$$

Let's do another derivation for practice – the operator $\Pi$ multiplies two numbers. We'll use the property $(f^m)^n = f^{mn}$ \begin{align} \Pi mn &= \lambda f\.f^{mn} \\ &= \lambda f\.(f^n)^m \\ &= \lambda f\.m(f^n) \\ &= \lambda f\.m(nf) \\ \Pi &= \lambda mnf\.m(nf) \end{align}

  var _0 = function(s,z){ return z };
var _1 = function(s,z){ return s(z) };
var _2 = function(s,z){ return s(s(z)) };
var _3 = function(s,z){ return s(s(s(z))) };

var successor = function(m,n,f,x) {return m(f)(n(f)(x))};
var add = function(m,n,f,x){ return m(f)(n(f)(x)) };

var multiply = function(m,n,f) { return m(n(f)) };



### Exercises

1. Try defining the exponentiation operator; test it by ensuring that exp _3 _2 = _9. Hint: The operator is only 6 symbols long!
2. Assume we have an operator pred such that pred _0 = _0 and pred _n = _(n-1). Give a short implementation of subtraction using pred. Challenge: Implement pred.
3. In Peano arithmetic, a natural number is defined inductively as either zero or a successor to a natural number. Is this related to our Church encoding?
4. Another way to embed natural numbers might be to represent $n$ as the $n$-deep constant function of $I$; for instance, $3$ would be represented by K (K (K I)). Derive arithmetic operators that work on this representation.
5. Is it possible to find an encoding for all integers? Rational numbers? Reals?

## Data Structures

Linked Lists, Stacks, Queues, Heaps, B-Trees – our universe is still missing such data structures or even a way to express them. Let's implement one of the building blocks of data structures, the pair. A pair is exactly what the name suggests it is – a container containing two elements in order, the head and the tail. Programmers using lisp-descended languages will know of them as cons cells.

Let's be more formal - we want a function $P$ that constructs a pair. Once the pair is constructed, functions $H$ and $T$ may be used to access the elements of the pair. Consistency requires that $$H(Pxy) = x \\ T(Pxy) = y$$

Here's a clever way of implementing $P$: $$Pxy = \lambda t\.txy \\$$

This now allows us to pass in a function $t$ and returns $txy$. We are almost done; to get $x$, for example, we can just pass in $\pi_1$, the first projection operator \begin{align} (\lambda t\.txy)\pi_1 = \pi_1 xy = x \end{align}

Written out fully, \begin{align} P &= \lambda xyt\.txy \\ H &= \lambda p. p(\lambda xy\.x) \\ T &= \lambda p. p(\lambda xy\.y) \end{align}

  var pair = function(P,Q,x){ return x(P)(Q) };
var fst = function(p){ return p(function(x,y){ return x }) };
var snd = function(p){ return p(function(x,y){ return y }) };
var p1 = pair(pair(a)(b))(pair(c)(d));


### Exercises

1. Verify that fst (fst p1) = a
2. Scheme and Lisp programmers have conventionally represented lists as left-leaning pairs; for example, (5 (3 (42 (4 'nil)))) represents the list [5, 3, 42, 4], and where 'nil is a special symbol. Let $\rho_n$ return the $n^{th}$ element of the list; for instance, $\rho_2 [5, 3, 42, 4] = 42$. Write a function that takes a number $n$ and returns $\rho_n$.
3. Another, more mathematically elegant, encoding represents a list as an accessor function; the list $[a, b, c]$ would be a function $f$ such that $f(0) = a$, $f(1) = b$ and $f(2) = c$. Write down the list of all even natural numbers $[0,2,4,6,\ldots$.
4. Yet another representation would represent $[x, y, z]$ as $\lambda cn\.cx(cy(czn))$. What is [_5, _3, _42, _4] add _0, where add is the operator that adds Church numbers? Write the function cons that appends a number to the head of the list.
5. The Church encoding of $n$ requires $O(n)$ symbols to write down. Paul Graham would encode $n$ as a list of elements that has length $n$, which still requires $O(n)$ symbols. Find a way to represent $n$ using only $O(\log n)$ symbols.

## Booleans and Control Flow

There are many ways to embed booleans into the lambda calculus. We could even declare that the Church numeral $1$ represents $\text{True}$ and $0$ represents $\text{False}$. However, since booleans are normally used for control flow or corditional evaluation ("if statements"), we'll choose a representation that is most convenient for expressing such control flow. Hence if $b$ is a boolean we declare it to have the property that. $$bxy$$

evaluates to $x$ if b is true and $y$ otherwise. Then \begin{align} T &= \lambda ab.a \\ F &= \lambda ab.b \end{align}

We can also define the common logical manipulators \begin{align} NOT &= \lambda pab. pba\\ AND &= \lambda pq. pqF \\ OR &= \lambda pq. pTq \end{align}

for example (OR p q) = (p T q) which can be read as "T if p else q", which is a correct definition of $OR$.

Let's try writing a conditional expression that operates on Church numerals and tells us if its argument is zero. The trick is to make use of the fact that $$C_r \circ C_r = C_r$$

that is, constant functions don't change when they are composed with themselves. Similarly $C_r \circ C_r \circ C_r$ is still $C_r$, and so on. Hence, the expression $$n C_F$$

will evaluate to $C_F$ when $n > 0$. However, it will evaluate to $I$ when $n=0$; a zero-fold composition is the special case! Then, we simply need to pass $T$ to it $$Zn = n(\lambda x\. F) T$$

and we will get $T$ if $n=0$ and $F$ otherwise. Written out fully, $$Z = \lambda n\. n(\lambda xab\. b) (\lambda ab\. b)$$

  var T = function(x,y){ return x };
var F = function(x,y){ return y };

var not = function(p,a,b){return p(b)(a)};
var and = function(p,q){return p(q)(function(x,y){return y})};
var or = function(p,q){return p( function(x,y){ return x } )(q) };


### Exercises

1. We can very elegantly represent sets by their membership function, ie, $S$ is represented by $\lambda x\.[x \in S]$, where $[x \in S]$ is $T$ if $x \in S$ and $F$ otherwise. Give a definition of the empty set.
2. Implement set union, intersection and complement. Hint: Why have we not put this question in the section about data structures?
3. Is $\lambda x\.T$ a representation of a set?
4. Alyssa P. Hacker immediately notices that she can encode integers as a pair $(s, n)$ where $s$ is a boolean representing whether the number is signed, and $n$ a natural number. However, she runs into problems when tryin to implement multiplication. Ben Bitdiddle suggests a beter way: let the pair $(n,k)$ represent $n-k$. Why is this better?

## Recursion?

Let's try to define a factorial function that computen $n! = n(n-1)(n-2)...2\cdot 1$

fac =
(lambda (n)
(if (= n 0)
1
(* n (fac (- n 1)))))


There's a problem with this - lambda calculus does not allow recursive definitions. Remember, we want to define fac as an anonymous function, that is, without referring to fac.

Hence, the expression fac is not allowed within the definition. Let's replace it with something legal.

F =
(lambda (former-fac)
(lambda (n)
(if (= n 0)
1
(* n (former-fac (- n 1))))))


you will notice that we have simply replaced fac by former-fac in the body and enclosed the whole thing in a (lambda (former-fac) .. ). We call the new function F

Let's imagine we already have a working copy of fac, perhaps from Plato's heaven. What is (F fac)? it is a function of one variable,

(F fac) =
(lambda (n)
(if (= n 0)
1
(* n (fac (- n 1)))))


We see that (F fac) = fac; this means that fac is the fixed point of F. We are very close now, because F was defined without recursion. Now all we need is a function that finds the fixed point of its argument, a function $Y$ satisfying $$F(YF) = YF$$

The above equation expresses the fact that $YF$ is a fixed point of $F$. How can we define $Y$?

## Y Combinator

The solution to this is one of the most beautiful results of the Lambda Calculus. First, we must take a little detour.

Consider \begin{align} \omega = (\lambda x\. xx) (\lambda x\. xx) \end{align}

Realize that $\omega$ expands to itself, that is, substitutin the second term, $\lambda x\. xx$, into the $x$ of the first term simply makes the whole expression evaluate to $\omega$ again. With a small modification, \begin{align} \omega' &= (\lambda x\. F(xx)) (\lambda x\. F(xx)) \\ &= F (\lambda x\. F(xx)) (\lambda x\. F(xx)) \\ &= F(\omega') \end{align}

so $\omega'$ is a fixed point of $F$. Abstracting the $F$ out, we conclude that this is the Y Combinator. \begin{align} Y = \lambda F.(\lambda x. F(xx)) (\lambda x. F(xx)) \end{align}

  var Y = function(f){ return (function(x){ return f((x)(x)) })(function(x){ return f((x)(x)) }) };

var _0 = function(s,z){ return z };
var _1 = function(s,z){ return s(z) };
var _2 = function(s,z){ return s(s(z)) };
var _3 = function(s,z){ return s(s(s(z))) };

var t = function(x,y){ return x }; // true
var f = function(x,y){ return y }; // false
var if0 = function(n){ return n(function(x){ return f })(t) }; // if n equals to _0

var mul = function(m,n,s,z){ return n(m(s))(z) };
var pred = function(n,s,z){ return n(function(f,g){ return g(f(s)) })(function(x){ return z })(function(x){ return x }) };

var F = function(r,n){ return if0(n)(_1)(mul(n)(r(pred(n)))) };
var fact = Y(F);


### Exercises

1. What do you think $Y_k$ is? Explain why it is what you say it. \begin{align} Y_k &= (L L L L L L L L L L L L L L L L L L L L L L L L L L) \\ L &= \lambda abcdefghijklmnopqstuvwxyzr\.(r(thisisafixedpointcombinator)) \end{align}

Note: each letter $a$, $b$, $c$ etc correspond to a single variable.

## More

If you enjoyed this, you may enjoy my work-in-progress Interactive Structure and Interpretation of Computer Programs. To get stay updated on future interactive guides, please enter your email below.

## Acknowledgement

1. The lambda calculus interpreter and terminal emulator used belongs to INA Lintaro and is available on github. The expandable section below allows you to set the options for the interpreter.
2. Alyssa P. Hacker and Ben Bitdiddle come from the pages of the wonderful book Structure and Interpretation of Computer Programs.
• Evaluation strategy:
• Output: