## The generic matrix and the Cayley–Hamilton theorem

Sometime during the first course in linear algebra we all learn the famous Cayley-Hamilton theorem which states the following:

Theorem: Let $A \in M_n (F)$ be an $n \times n$ matrix over a field $F$, and denote by $p_A(t)=\det(tI-A)$ its characteristic polynomial. Then $p(A)=0$.

The “easy” proof for this theorem is just noting that $p_A(A)=\det(AI-A)=det(0)=0$. Of course, this is not a real proof since in the definition as a determinant of $p_A(x)$, the symbol $x$ is a place holder for a scalar and not a matrix. Nevertheless, the theorem is still true and has several proofs. For example, if you believe in the Jordan presentations or that each matrix can be approximated by a diagonalizable matrix, then this theorem is not that far fetched, since it is true for matrices in the Jordan form and for diagonlizable matrices.

Our goal here is to show that the most “generic” matrix satisfies this theorem, so all the other matrices have no real choice, and must follow the footsteps of their generic master ruler and do the same.

We start, as always, with a definition.

Definition (The generic matrix): Let $\{ x_{i,j}; \; 1\leq i,j \leq n \}$  be algebraically independent indeterminates over $\mathbb{Z}$. The generic matrix is defined to be

$X=\left(\begin{array}{ccc} x_{1,1} & \cdots & x_{1,n}\\ \vdots & & \vdots\\ x_{n,1} & \cdots & x_{n,n} \end{array}\right)\in M_{n}(\mathbb{Z}\left[x_{i,j}\right]).$

What makes this matrix into a “generic” matrix is the following argument. Suppose that $F$ is any field, and $A=(a_{i,j})_{i,j=1}^n$ is any $n \times n$ matrix over $F$. Any polynomial $f(x_{1,1},x_{1,2},...,x_{n,n})\in \mathbb{Z}\left[x_{i,j}\right]$ defines an element in $F$, simply by assigning $x_{i,j}$ to be $a_{i,j}$. Let us denote this map by $\varphi_A :\mathbb{Z}\left[x_{i,j}\right] \to F$. Clearly, this is a homomorphism of rings, which we can extend to a homomorphism of the matrix rings $\varphi_A :M_n(\mathbb{Z}\left[x_{i,j}\right]) \to M_n(F)$. We now have the almost too trivial observation that $\varphi_A(X) = A$ – or in other words, any matrix $A$ is an image of the matrix $X$ (under some carefully chosen homomorphism). We note that the same process will work if we take any commutative ring $R$ instead of a field $F$.

The previous argument might seem silly at first, since everything was defined so that $A$ will be an image of $X$ so we should not be surprised when this actually happens. True as it is, let us show that this will tell us a lot about the matrix $X$. But first, to shorten the notation, lets give a name to the phenomenon above.

Definition (specialization): Let $R,S$ be two unital integral domains (namely commutative without zero divisors), and $A,B$ be two $n \times n$ matrices over $R$ and $S$ respectively. We say that $A$ specializes to $B$ if there is some homomorphism $\varphi:R \to S$ such that its extension to $M_n(R)$ (which we also denote by $\varphi$) satisfies $\varphi(A)=B$.

Under this notations, we see that any $n\times n$ matrix over any field (or unital commutative rings) is a specialization of $X$. The main idea now is to show that if $A$ specializes to $B$, then for certain properties if $B$ satisfy them then so is $A$ and for certain properties the other direction is true.

For a matrix $A$, denote by $p_A(t)$ its characteristic polynomial. Assume that $A$ specializes to $B$ through the homomorphism $\varphi:R\to S$. We also denote by $\varphi$ its natural extensions $\varphi:M_n(R) \to M_n(S)$ and $\varphi:R[t]\to S[t]$.

Lemma 1: We have $\varphi(p_A(t))=p_B(t)$.

Proof: Since $\varphi$ is a homomorphism of rings, we get that for any matrix $C$ over $R$ we have that $\varphi(\det(C))=\det(\varphi(C))$. We thus have that

$\varphi(p_A(t))=\varphi(\det(tI-A))=\det(tI-\varphi(A))=\det(tI-B)=p_B(t)$.       $\square$

Lemma 2: If $p_B(t)$ is irreducible, then $p_A(t)$ is irreducible.

Proof: Suppose that $p_A=f\cdot g$ is a nontrivial decomposition (namely $\deg(f),\deg(g) \geq 1$, which implies that $p_B=\varphi(f)\cdot \varphi(g)$. We claim that this is a nontrivial decomposition as well. This follows from the fact that $p_A$ is monic which implies that $p_B$ is monic, and hence $\deg(p_B)=n$. It then follows that $\deg(\varphi(f))=\deg(f)$ and $\deg(\varphi(g))=\deg(g)$ so we obtain a nontrivial decomposition for $p_B$. Thus, if $p_B$ is irreducible, then so is $p_A$.    $\square$

Corollary: The matrix $X$ has an irreducible characteristic polynomial.

Proof: We know that $X$ specializes to any $n\times n$ matrix, so by the previous lemma we only need to find one such matrix with an irreducible characteristic polynomial, and of course there are plenty of those.   $\square$

Recall, that if $f$ is a polynomial of degree $n$ over a characteristic zero field, then the irreducibility of $f$ implies that all the roots of $f$ are distinct. It is always a great joy for a matrix to have such a characteristic polynomial, since then it must have $n$ distinct eigenvalues, and therefore diagonalizable (over the algebraic closure)! Any diagonal matrix clearly satisfies the Cayley-Hamilton theorem, and with some further thought, we see that this is also true for diagonalizable matrices (This is because if $A=PBP^{-1}$ for some invertible matrix $P$, then $p_A(t)=p_B(t)$). Thus we have the following:

Corollary: The generic matrix $X$ satisfies $p_X(X)=0$.

Proof: By the previous corollary the polynomial $p_X(t)$ is irreducible, and it is defined over $\mathbb{Z}[x_{i,j}]$ which is of characteristic zero. Thus $X$ is diagonalizable, and hence satisfy the Cayley-Hamilton theorem.   $\square$

And now we can finally prove the Cayley-Hamilton Theorem:

Proof (Cayley-Hamilton): Let $A$ be any $n\times n$ matrix over a field $F$ and let $\varphi:\mathbb{Z}[x_{i,j}]\to F$ be the specialization map. Again, using the fact that $\varphi$ is a homomorphism we get that:

$0=\varphi(0)=\varphi(p_X(X))=p_A(\varphi(X))=p_A(A)$

and we are done. $\square$

The idea of generic objects is of course much more common than only the generic matrix. Indeed, this notion is closely related to free objects and other universal properties. The next step from here is not to look on a single generic matrix, but on the algebra generated by several generic matrices (on distinct indeterminates). This algebra will be generic in the sense that it specializes to any matrix algebra (and not just a single matrix), but this is a whole new subject on its own and will be left for another time.

This entry was posted in Generic Objects and tagged , . Bookmark the permalink.