The generic matrix and the Cayley–Hamilton theorem

Sometime during the first course in linear algebra we all learn the famous Cayley-Hamilton theorem which states the following:

Theorem: Let A \in M_n (F) be an n \times n matrix over a field F, and denote by p_A(t)=\det(tI-A) its characteristic polynomial. Then p(A)=0.

The “easy” proof for this theorem is just noting that p_A(A)=\det(AI-A)=det(0)=0. Of course, this is not a real proof since in the definition as a determinant of p_A(x), the symbol x is a place holder for a scalar and not a matrix. Nevertheless, the theorem is still true and has several proofs. For example, if you believe in the Jordan presentations or that each matrix can be approximated by a diagonalizable matrix, then this theorem is not that far fetched, since it is true for matrices in the Jordan form and for diagonlizable matrices.

Our goal here is to show that the most “generic” matrix satisfies this theorem, so all the other matrices have no real choice, and must follow the footsteps of their generic master ruler and do the same.

We start, as always, with a definition.

Definition (The generic matrix): Let \{ x_{i,j}; \; 1\leq i,j \leq n \}  be algebraically independent indeterminates over \mathbb{Z}. The generic matrix is defined to be

X=\left(\begin{array}{ccc} x_{1,1} & \cdots & x_{1,n}\\ \vdots & & \vdots\\ x_{n,1} & \cdots & x_{n,n} \end{array}\right)\in M_{n}(\mathbb{Z}\left[x_{i,j}\right]).

What makes this matrix into a “generic” matrix is the following argument. Suppose that F is any field, and A=(a_{i,j})_{i,j=1}^n is any n \times n matrix over F. Any polynomial f(x_{1,1},x_{1,2},...,x_{n,n})\in \mathbb{Z}\left[x_{i,j}\right] defines an element in F, simply by assigning x_{i,j} to be a_{i,j}. Let us denote this map by \varphi_A :\mathbb{Z}\left[x_{i,j}\right] \to F. Clearly, this is a homomorphism of rings, which we can extend to a homomorphism of the matrix rings \varphi_A :M_n(\mathbb{Z}\left[x_{i,j}\right]) \to M_n(F). We now have the almost too trivial observation that \varphi_A(X) = A – or in other words, any matrix A is an image of the matrix X (under some carefully chosen homomorphism). We note that the same process will work if we take any commutative ring R instead of a field F.

The previous argument might seem silly at first, since everything was defined so that A will be an image of X so we should not be surprised when this actually happens. True as it is, let us show that this will tell us a lot about the matrix X. But first, to shorten the notation, lets give a name to the phenomenon above.

Definition (specialization): Let R,S be two unital integral domains (namely commutative without zero divisors), and A,B be two n \times n matrices over R and S respectively. We say that A specializes to B if there is some homomorphism \varphi:R \to S such that its extension to M_n(R) (which we also denote by \varphi) satisfies \varphi(A)=B.

Under this notations, we see that any n\times n matrix over any field (or unital commutative rings) is a specialization of X. The main idea now is to show that if A specializes to B, then for certain properties if B satisfy them then so is A and for certain properties the other direction is true.

For a matrix A, denote by p_A(t) its characteristic polynomial. Assume that A specializes to B through the homomorphism \varphi:R\to S. We also denote by \varphi its natural extensions \varphi:M_n(R) \to M_n(S) and \varphi:R[t]\to S[t].

Lemma 1: We have \varphi(p_A(t))=p_B(t).

Proof: Since \varphi is a homomorphism of rings, we get that for any matrix C over R we have that \varphi(\det(C))=\det(\varphi(C)). We thus have that

\varphi(p_A(t))=\varphi(\det(tI-A))=\det(tI-\varphi(A))=\det(tI-B)=p_B(t).       \square

Lemma 2: If p_B(t) is irreducible, then p_A(t) is irreducible.

Proof: Suppose that p_A=f\cdot g is a nontrivial decomposition (namely \deg(f),\deg(g) \geq 1, which implies that p_B=\varphi(f)\cdot \varphi(g). We claim that this is a nontrivial decomposition as well. This follows from the fact that p_A is monic which implies that p_B is monic, and hence \deg(p_B)=n. It then follows that \deg(\varphi(f))=\deg(f) and \deg(\varphi(g))=\deg(g) so we obtain a nontrivial decomposition for p_B. Thus, if p_B is irreducible, then so is p_A.    \square

Corollary: The matrix X has an irreducible characteristic polynomial.

Proof: We know that X specializes to any n\times n matrix, so by the previous lemma we only need to find one such matrix with an irreducible characteristic polynomial, and of course there are plenty of those.   \square

Recall, that if f is a polynomial of degree n over a characteristic zero field, then the irreducibility of f implies that all the roots of f are distinct. It is always a great joy for a matrix to have such a characteristic polynomial, since then it must have n distinct eigenvalues, and therefore diagonalizable (over the algebraic closure)! Any diagonal matrix clearly satisfies the Cayley-Hamilton theorem, and with some further thought, we see that this is also true for diagonalizable matrices (This is because if A=PBP^{-1} for some invertible matrix P, then p_A(t)=p_B(t)). Thus we have the following:

Corollary: The generic matrix X satisfies p_X(X)=0.

Proof: By the previous corollary the polynomial p_X(t) is irreducible, and it is defined over \mathbb{Z}[x_{i,j}] which is of characteristic zero. Thus X is diagonalizable, and hence satisfy the Cayley-Hamilton theorem.   \square

And now we can finally prove the Cayley-Hamilton Theorem:

Proof (Cayley-Hamilton): Let A be any n\times n matrix over a field F and let \varphi:\mathbb{Z}[x_{i,j}]\to F be the specialization map. Again, using the fact that \varphi is a homomorphism we get that:


and we are done. \square

The idea of generic objects is of course much more common than only the generic matrix. Indeed, this notion is closely related to free objects and other universal properties. The next step from here is not to look on a single generic matrix, but on the algebra generated by several generic matrices (on distinct indeterminates). This algebra will be generic in the sense that it specializes to any matrix algebra (and not just a single matrix), but this is a whole new subject on its own and will be left for another time.


This entry was posted in Generic Objects and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s