# Arithmetic, Self-Reference and Truth

There is a remarkable theorem in mathematical logic that “Truth is not definable”. This is known as Tarski’s Undefinability Theorem. This result (along with Gödel’s Incompleteness Theorems) has fascinated me ever since I learned about it. This theorem (in a sense) shows us the limits of the ability to study truth within a given formal system.

# Formulas and Truth

In a previous post, I briefly discussed first order logic and arithmetic. To review: formulas are built up as follows:

• equations of arithmetic terms (involving addition and multiplication) are formulas (example: $4x^2 + 2xy - y^3 = xyz$)
• inequalities of arithmetic terms are formulas (example: $x < xy$)
• conjunctions, disjunctions, and negations of (smaller) formulas are formulas
• quantifications of formulas are formulas (examples: $\exists x (x < y)$ and $\forall x (x + 0 = x)$). Recall that $\exists$ (the existential quantifier) means “there exists” and $\forall$ (the universal quantifier) means “for all”. In the context of natural numbers, this “there exists x…” means “there is some natural number x…”, and “for all y…” means “for every natural number y…”

It does not make sense to talk about whether or not a particular formula is “true”. For example: is $x = y$ true? It’s certainly either true or false when we plug in specific numbers for $x$ and $y$. But until we do so, it does not have a truth value.

The formula $\exists x \forall y (x < y + 1)$ does have a truth value. It is true: there really is an $x$, namely $x = 0$, so that the formula $\forall y (0 < y + 1)$ is true. The main difference between the formulas $x = y$ and $\exists x \forall y (x < y + 1)$ is that the former formula has free variables (x and y), while the variables in the latter formula are bound by quantifiers. Formulas are usually denoted with Greek letters like $\phi, \psi, \theta$, etc.; if a formula has free variables, we usually specify that: $\phi(x)$ would be a formula with a single free variable, $\psi(x, y)$ would be a formula with two free variables, etc. If $\phi(x)$ is a formula, then $\phi(2)$ refers to the formula obtained from $\phi(x)$ by plugging in the number 2 for x, wherever x appears in the formula.

Formulas without free variables are called statements — statements have truth values. Truth values for statements are defined inductively, similar to how formulas are defined.

Given a formula $\phi(x)$, we obtain the set that it defines: $X = \{ n : \phi(n) \}$, the set of all those numbers $n$ such that $\phi(n)$ is true. Some sets of numbers are easily seen to be definable: the set of even numbers, for example, is the set $\{ n : \exists x (x + x = n) \}$. Other definable sets include the set of odd numbers, the set of prime numbers, the set of perfect squares, …

# Arithmetization

The surprising thing about arithmetic is actually just how much is definable. While it may appear, at first, that all definable sets are based on number-theoretic properties (e.g. even numbers, odd numbers, numbers that are divisible by 23, prime numbers, composite numbers, etc), the structure of definable sets is much richer than that.

For example: finite sequences of numbers can be represented by a single number using coding. The idea of this is very simple: every number has a unique prime factorization, and so, letting $p_0, p_1, p_2, \ldots$ be the (infinite) sequence of prime numbers (in order, so $p_0 = 2, p_1 = 3, \ldots$), we let the number $n = p_0^{x_0} \times p_1^{x_1} \times \ldots \times p_k^{x_k}$ code the sequence $(x_0, x_1, \ldots, x_k)$. (The details are a bit more complicated than this, but they are not incredibly important.) Then, there is a formula $\phi(n, i, x)$ which is true if and only if $x$ is the $i^{\text{th}}$ number in the sequence coded by $n$.

This ability to code finite sequences is extremely powerful. For the purposes of this post, I will focus on how we can use this to represent first-order logic within arithmetic itself. Recall: formulas are certain finite sequences of the following symbols: $0, 1, +, \times, =, <, \wedge, \vee, \exists, \forall, (, )$, along with a countably infinite set of variables $x_0, x_1, x_2, \ldots$. We can code these symbols as numbers in the following way: let 0 represent 0, 1 represent 1, 2 represent $+$, 3 represent $\times$, etc. Then, since a formula is a finite sequence of these symbols, the Gödel code of a formula is the number which represents that finite sequence. In fact, there is an effective process to go between a formula $\phi(x)$ and its Gödel code $\lceil \phi(x) \rceil$. Not every number codes a formula, but the set of all numbers which code some formula is in fact definable. Further, the set of all numbers which code a statement is also definable.

That is: there is a formula $G(x)$ such that $G(n)$ is true if and only if the sequence that is coded by $n$ represents an actual formula. For example, using the coding I started to outline above, the sequence $(1, 2, 0)$ represents $1 + 0$, which is not a formula (it is an expression), while the sequence $(0, 4, 1)$ represents the formula $0 = 1$.

So to sum up: there is a single formula which recognizes whether or not any finite sequence of symbols constitutes an actual formula. This means that arithmetic provides a viable background theory in which to study logical arguments. That is: we can study logic, itself, within arithmetic, by studying codes of formulas. So we can, for example, study logical proofs, since proofs are finite sequences of statements satisfying some (definable) conditions (assuming that the set of codes of axioms you are using is definable, which it is for the axioms traditionally used in arithmetic). But interestingly, this also means we can do “metalogic” inside arithmetic — that is, we can formalize statements about logical statements.

# Self-reference

We are getting eerily close to the famous “Liar’s Paradox”. Consider the assertion (in natural language) “This sentence is false.” If we accept it as true, then it must actually be false, and vice versa — a contradiction either way, and so there is no coherent way to assign a truth value to that assertion. Of course in classical logic, a well formed statement must have a truth value (in the natural numbers), and so that must mean that this kind of self-reference isn’t allowed in logic, right?

Not exactly. Given any formula $\theta(x)$, there is a “self-referential” statement $\sigma$ which is true if and only if $\theta(\lceil \sigma \rceil)$ is true (this result is known alternatively as the “Diagonal Lemma” or “Fixed Point Lemma”). That is, $\sigma$ asserts that $\sigma$ has the property defined by $\theta(x)$. So for example, if $\theta(x)$ is the formula that says “There is no proof from the axioms of arithmetic of the formula coded by $x$“, then $\sigma$ would be true if and only if there is no proof of it! In other words, $\sigma$ is, essentially, the assertion “This sentence is not provable.” (This result is well-known as one of Gödel’s incompleteness theorems.)
So if self-reference isn’t the problem with formalizing the Liar’s Paradox, what exactly is that problem? “This sentence is not provable” can be formalized– but “This sentence is false” cannot? The issue must be with the notion of truth itself. This is precisely what Tarski’s Undefinability Theorem states: there is no formula $\tau(x)$ such that, for any statement $\sigma$, $\sigma$ is true if and only if $\tau(\lceil \sigma \rceil)$ is true.
The proof is simple, given the “Diagonal Lemma”: suppose $\tau(x)$ defines truth. That is, it expresses “The statement coded by x is true”. Then its negation should express “The statement coded by x is false.” Apply the Diagonal Lemma and obtain a self-referential statement $\sigma$, which would express “This statement is false.” Then $\sigma$ is true if and only if it is false, which is a contradiction. Therefore there can be no formula $\tau(x)$!