Scaling and Dimensionality Reduction - Understanding Non-diagonalizable Matrices

Geometrically, a matrix represents a linear mapping. This article aims to explore the geometric meaning of a general non-diagonalizable matrix, providing an intuitive explanation from the perspective of linear transformations.

Here’s the situation: I encountered the following question:

For a matrix $A \in \mathbb{R}^{m \times m}$, if all its eigenvalues have absolute values less than 1, then for any $x \in \mathbb{R}^{m}$, we have

\[\lim_{k \to \infty} \lVert A^{k}x \rVert = 0.\]

This proposition is very intuitive. The geometric meaning of the eigenvalues is that the matrix (or linear operator) scales in the directions of its eigenvectors; if the scaling factor in every direction is less than 1, then repeatedly applying the matrix will eventually compress any vector to 0.

If the matrix is diagonalizable, there is a very straightforward geometric proof:

Since the eigenvectors $v_{1}, \dots, v_{m}$ of a diagonalizable matrix $A$, with $Av_{i} = \lambda_{i}v_{i}$, form a basis of $\mathbb{R}^{m}$, any $x \in \mathbb{R}^{m}$ can be expressed as a linear combination of these eigenvectors: \(x = a_{1}v_{1} + \dots + a_{m}v_{m}, \quad \text{for some } a_{i} \in \mathbb{R}.\) Then, $A^{k}x$ can be written as: \(A^{k}x = a_{1}A^{k}v_{1} + \dots + a_{m}A^{k}v_{m} = a_{1}\lambda_{1}^{k}v_{1} + \dots + a_{m}\lambda_{m}^{k}v_{m}.\) Since $\lambda_{1}, \dots, \lambda_{m}$ are all less than 1, it follows that \(\lim_{k \to \infty} A^{k}x = 0.\)

We can also make a more straightforward by decomposing the matrix $A$.

Since $A$ is diagonalizable, it can be written as $A = P^{-1}\Lambda P$, where $P$ is invertible and $\Lambda$ is a diagonal matrix whose diagonal entries are the eigenvalues of $A$. Because all the diagonal entries of $\Lambda$ are less than 1, we have \(\lim_{k \to \infty} \Lambda^{k} = 0.\) Naturally, since $A^{k} = P^{-1}\Lambda^{k}P$, it follows that \(\lim_{k \to \infty} A^{k} = 0.\)

Following this idea, the proof for non-diagonalizable matrices is also quite direct. Although such matrices cannot be diagonalized, we can similarly utilize their Jordan form: $A = P^{-1}JP$, where $P$ is invertible and $J$ is a block-diagonal matrix:

\[J = \begin{pmatrix} J_1 & 0 & \cdots & 0 \\ 0 & J_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & J_l \\ \end{pmatrix}, \quad \text{where} \quad J_{i} = \begin{pmatrix} \lambda_{i} & 1 & & 0 \\ & \ddots & \ddots & \\ & & \ddots & 1 \\ 0 & & & \lambda_{i} \end{pmatrix}.\]

Here, $\lambda_{1}, \dots, \lambda_{l}$ are the eigenvalues of $A$.

We can then express $J$ as the sum of a diagonal matrix and another special matrix, namely, $J = D + N$:

\[D = \begin{pmatrix} \lambda_{1} & 0 & & 0 \\ & \lambda_{1} & & \\ & & \ddots & 0 \\ 0 & & & \lambda_{l} \end{pmatrix} \quad N = \begin{pmatrix} 0 & 1 & & 0 \\ & 0 & \ddots & \\ & & \ddots & 1 \\ 0 & & & 0 \end{pmatrix}.\]

In $N$, only the entries on the superdiagonals of the Jordan blocks (i.e., in the upper right corners of each block) are possibly 1 (or 0), and all other entries are 0.

Notice that $N^{m} = 0$ (in fact, $N$ is a nilpotent operator). Therefore, for sufficiently large $k$, when expanding $J^{k}$ as a polynomial, all terms involving powers of $N$ higher than $m$ will vanish. We obtain:

\[J^{k} = D^{k} + \begin{pmatrix} k \\ k-1 \end{pmatrix} D^{k-1}N + \dots + \begin{pmatrix} k \\ k-m+1 \end{pmatrix} D^{k-m+1}N^{m-1}.\]

Since $m$ is fixed and the diagonal entries of $D$ are all less than 1, we have

\[\lim_{k \to \infty} J^{k} = 0.\]

Q.E.D.

But what does this proof actually imply? How can we understand this proposition geometrically, and how can we interpret a non-diagonalizable matrix?

First, please keep in mind: A matrix represents a linear transformation. Let us return to the Jordan decomposition of the matrix $A$: $A = P^{-1}JP$. Note that $P$ is an invertible matrix, which in this formula serves the role of choosing a new basis for the linear space $\mathbb{R}^{m}$. This expression implies that when we take the column vectors of $P$ as a basis for $\mathbb{R}^{m}$ (in fact, this basis consists of the generalized eigenvectors of $A$), the linear transformation corresponding to $A$ can be represented by a block-diagonal matrix (the Jordan form). We can say that $A$ and $J$ are isomorphic, meaning that to understand the structure of $A$, we only need to understand the structure of $J$. Thus, we have:

\[A \cong J = D + N.\]

Here, $D$ is a diagonal matrix, which implies that along the directions of $A$’s generalized eigenvectors, scaling occurs; $N$ is a nilpotent matrix, meaning that it reduces the dimensionality of vectors in $\mathbb{R}^{m}$, i.e., it projects them onto a linear subspace. (The reader can verify this on their own. Note that $N$ is not a projection matrix; applying $N$ to $Nv$ will further project it onto an even smaller subspace.)

At this point, we have a geometric characterization of a non-diagonalizable matrix: scaling combined with dimensionality reduction. Thus, it is intuitively clear that any matrix with all eigenvalues having absolute values less than 1, when raised to a sufficiently high power, will eventually converge to 0.

PS: This article is intended for readers who have only studied elementary linear algebra, so I have avoided using the full language of linear operators. With more advanced terminology, one can have a more natural understanding of this issue.

References:

Axler, S. (2015). Linear Algebra Done Right. Springer.




If you found this useful, please cite this as:

Ma, Langtian (Apr 2024). Scaling and Dimensionality Reduction - Understanding Non-diagonalizable Matrices. https://LangtianM.github.io.

or as a BibTeX entry:

@article{ma2024scaling-and-dimensionality-reduction-understanding-non-diagonalizable-matrices,
  title   = {Scaling and Dimensionality Reduction - Understanding Non-diagonalizable Matrices},
  author  = {Ma, Langtian},
  year    = {2024},
  month   = {Apr},
  url     = {https://LangtianM.github.io/blog/2024/Nondiag/}
}



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Simulation with Copulas
  • Deriving Maximum Likelihood Estimation from Information Theory
  • Notes
  • Second-order Inquiries on Statistics
  • A Note on Mutual Information