Today I Learned

(back up to May)

May 8th

Today I learned the power iteration method to compute the largest eigenvalue of a matrix, from (say) the Wikipedia page . The algorithm is as follows.

Proposition. Fix a diagonalizable matrix $M\in\CC^{n\times n}$ with a single largest eigenvalue (in terms of absolute value), and pick up a unit vector $v_0$ with a nonzero component in the direction of the largest eigenvalue's eigenvector. Then define the sequence of unit vectors recursively by \[v_{k+1}=\frac{Mv_k}{\lVert Mv_k\rVert}\] for $k\ge0.$ Then $v_k$ will approach the span of the desired eigenvector.

We make a few comments before jumping into the proof. For example, look at\[M=\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}.\]If we look at $v_0=\frac1{\sqrt2}\langle1,-1\rangle,$ an eigenvector, then we note that\[v_k=\frac1{\sqrt2}\begin{bmatrix}(-1)^k \\ (-1)^{k+1} \end{bmatrix}\]for $k\ge0$ by an inductive argument (say). In particular, $v_k$ doesn't actually approach a single eigenvector because it flips between two, but it does remain on the span of a single eigenvector, which is enough. This is a general problem for negative eigenvalues because they will flip the vector.

However, $M$ doesn't even work for the proposition because its two eigenvalues $1$ and $-1$ have the same magnitude, so no eigenvector has to be found by this process. Because $M^2=I,$ the $\{v_k\}$ will always alternate between two vectors, with no need to converge to a particular line. For example, $v=\langle1,0\rangle$ with alternate between $\langle1,0\rangle$ and $\langle0,1\rangle.$

We also remark that the hypothesis for $v_0$ to have a nonzero component in the direction of the largest eigenvector's eigenvector is necessary. For example, if $v_0\in\ker M,$ then the process will immediately break down.

Anyways, we now prove this. The main idea is to "amplify'' the component of $v$ pointing in the direction of the desired eigenvector by application. Because $M$ is diagonalizable, we have an eigenbasis $\{e_\ell\}_{\ell=1}^n$ with eigenvalues $\{\lambda_\ell\}_{\ell=1}^n.$ Rearranging as necessary, we order the eigenvalues by $|\lambda_1| \gt |\lambda_2|\ge|\lambda_3|\ge\cdots\ge|\lambda_n|.$ Note the first inequality is strict.

Now, picking up any unit vector $v_0$ with a nonzero component in the direction of we can write it in the eigenbasis as\[v_0=\sum_{\ell=0}^na_\ell e_\ell.\]Then the main idea is manifested by writing\[M^kv_0=\sum_{\ell=0}^n\lambda_\ell^ka_\ell e_\ell,\]which means\[\frac{M^kv_0}{\lambda_1^k}=\sum_{\ell=0}^n\left(\frac{\lambda_\ell}{\lambda_1}\right)^ka_\ell e_\ell.\]In particular, as $k\to\infty,$ we have that $M^kv_0/\lambda_1^k$ approaches $a_1e_1$ because each of the ratios $(\lambda_\ell/|\lambda_1|)^k\to0$ except for $\ell=1.$

To finish, we can show, by an inductive argument for example, that $v_k=M^kv_0/\lVert M^kv_0\rVert$ (the constants we multiply by each time can be transferred in and out of the matrix multiplication, so all that matters is ensuring we have a unit vector). This means that\[v_k=\frac{M^kv_0}{\lVert M^kv_0\rVert}=\frac{\lambda_1^k}{\lVert M^kv_0\rVert}\cdot\frac{M^kv_0}{\lambda_1^k}.\]Sending $k\to\infty$ will make the vector here approach $a_1e_1,$ but $v_k$ needs to be a unit vector scaled by a positive quantity, so what we know is that\[\left\lVert v_k-\left(\frac{\lambda_1}{|\lambda_1|}\right)^k\frac{a_1e_1}{\lVert a_1e_1\rVert}\right\rVert\to0.\]In particular, the vector $v_k$ is approaching is $M^kv_0$ scaled by a positive value to be a unit vector. Note here we have finally used the condition $a_1\ne0.$

This is what we meant by "$v_k$ approaches the span of the desired eigenvector,'' and indeed $k\to\infty$ makes $v_k$ arbitrarily close to the span of $e_1.$ So we are done. $\blacksquare$

We remark that we can also detect the eigenvalue by computing $Mv_k\cdot v_k$ because, as $k\to\infty,$\[Mv_k\cdot v_k\to\left(\frac{\lambda_1}{|\lambda_1|}\right)^k\frac{\lambda_1 a_1e_1}{\lVert a_1e_1\rVert}\cdot\left(\frac{\lambda_1}{|\lambda_1|}\right)^k\frac{a_1e_1}{\lVert a_1e_1\rVert}=\lambda_1\]by (say) continuity of all the functions involved.

We also note that we can use the power iteration algorithm on $M^{-1}$ (assuming that the matrix is invertible) to get out the smallest eigenvalue. In fact, once the largest eigenvalue's eigenvector is computed, we can remove its span from the space and repeat this process to (gradually) get out all the eigenvectors. Of course, this is sketchy because convergence of these eigenvectors is potentially slow.

I am under the impression that we can throw out the diagonalizable condition on $M$ by using the Jordan normal form instead of diagonalization, but this seemed unneccessarily painful. The kinds off matrices we are doing this with, in $\CC^{n\times n},$ are almost always diagonalizable. A similar intuition makes me comfortable with the condition that $v_0$ having a nonzero component in the direction of $e_1$: random vectors will be fine.