3Blue1Brown

Chapter 14Eigenvectors and eigenvalues

"Last time, I asked: 'What does mathematics mean to you?', and some people answered: 'The manipulation of numbers, the manipulation of structures.' And if I had asked what music means to you, would you have answered: 'The manipulation of notes?'"

- Serge Lang

Eigenvectors and eigenvalues are some of those topics that a lot of students find particularly unintuitive. Questions like “why are we doing this?” and “what does this actually mean?” are too often left floating away unanswered in a sea of computations. And as I've put out chapters in this series, many of you have commented about looking forward to visualizing this topic in particular.

I suspect that the reason for this is not so much that eigen-things are particularly complicated or poorly explained. In fact, it's comparatively straightforward, and I think most books do a fine job explaining it. The issue is that it only really makes sense if you have a solid visual understanding of many of the topics that precede it. Most important here is that you know how to think about matrices as linear transformations. But you also need to be comfortable with determinants , linear systems of equations, and change of basis. Confusion about eigen-stuffs usually has more to do with a shaky foundation in one of these topics than it does with the eigenvectors and eigenvalues themselves.

An example

To start, consider some linear transformation in two dimensions, like the one shown here.

It moves the basis vector ı^\hat{\imath} to the coordinates [30]\left[\begin{array}{c}3 \\ 0\end{array}\right], and ȷ^\hat{\jmath} to [12]\left[\begin{array}{c}1 \\ 2\end{array}\right], so it's represented with a the matrix [3102]\left[\begin{array}{c}3 & 1 \\ 0 & 2\end{array}\right]. Now, focus on what it does to any individual vector, and think about the span of that vector, which is the line passing through the origin and its tip.

Most vectors will get knocked off that line during the transformation. It would seem pretty coincidental if the place where a vector lands happens to be somewhere on that line. But some special vectors do remain on their own span, meaning the effect the matrix has on such a vector is to just stretch or squish it, like a scalar.

For this specific example, the basis vector ı^\hat{\imath} is one such vector with this special property. The span of ı^\hat{\imath} is the xx-axis, and from the first column of the matrix we can see that ı^\hat{\imath} moves over to 33 times itself, still on that xx-axis. What's more, because of the way linear transformations work, any other vector on the xx-axis is also just stretched by a factor of 33, and hence remains on its own span.

A slightly sneakier vector that remains on its own span during this transformation is [11]\left[\begin{array}{c}1 \\ -1\end{array}\right]. It ends up getting stretched out by a factor of 22. And again, linearity will imply that any other vector on the diagonal line spanned by this one will just get stretched out by a factor of 22.

For this transformation, those are all the vectors with this special property of staying on their span: Those on the xx-axis that get stretched by a factor of 33, and those on this diagonal line that are stretched by a factor of 22. Any other vector will get rotated somewhat during the transformation, knocked off of the line it spans.

These special vectors are called the "eigenvectors" of the transformation, and each eigenvector has associated with it what's called an "eigenvalue", which is the factor by which it's stretched or squished during the transformation. In this example, vectors on the xx-axis are eigenvectors with eigenvalue 33, while those on the diagonal line are eigenvectors with eigenvalue 22.

Of course, there's nothing special about stretching vs. squishing, or the fact that these eigenvalues happen to be positive. In another example you could have an eigenvector with eigenvalue 12-\frac{1}{2}, meaning that vector gets flipped and squished by a factor of 12\frac{1}{2}, but the important part is that it stays on the line that it spans out, without being rotated off of it.

Eigenvector with eigenvalue 12-\frac{1}{2}.

3d rotational axis

For a glimpse of why this might be a useful thing to think about, consider some three-dimensional rotation. If you can find an eigenvector for this rotation, a vector which remains on its own span, what you have found is the axis of rotation.

It's much easier to think about a 3d rotation in terms of an axis of rotation, and an angle by which it is rotating, rather than thinking about the full 3x3 matrix associated with such a transformation.

In this case, the corresponding eigenvalue would have to be 11, since 3d rotations don't stretch or squish anything.

This is a pattern that shows up a lot in linear algebra: With any linear transformation described by a matrix, you could understand what it's doing by reading off the columns of this matrix as the landing spots for the basis vectors. But often a better way to get at the heart of what that transformation actually does, less dependent on your particular coordinate system, is to find the eigenvectors and eigenvalues.

Notes on computation

We won't cover the full details on methods for computing eigenvectors and eigenvalues here, but we'll try to give an overview of the computational ideas that are most important for conceptual understanding.

Symbolically, here's what the idea of eigenvectors and eigenvalues looks like. AA is the matrix representing some transformation, with v\vec{\mathbf{v}} as an eigenvector, and λ\lambda (lambda) is a number, namely the corresponding eigenvalue.

What this expression is saying is that the matrix-vector product AA times v\vec{\mathbf{v}} gives the same result as scaling the eigenvector v\vec{\mathbf{v}} by the eigenvalue lambda.

Finding the eigenvectors and eigenvalues of a matrix AA comes down to finding values of v\mathbf{v} and λ\lambda that make this expression true. It's a little awkward to work with at first, because the left-hand-side represents matrix vector multiplication, while the right-hand-side is scalar-vector multiplication. Let's start by rewriting that right-hand-side as some kind of matrix vector multiplication, using a matrix which has the effect of scaling vectors by a factor of lambda.

The columns of such matrix will represent what happens to each basis vector, and each basis vector is simply multiplied by lambda, so this matrix will have the number lambda down the diagonal with zeros everywhere else. The common way to write this matrix is as lambda times II, where II is the identity matrix with 11s down the diagonal.

With both sides looking like matrix-vector multiplication, we can subtract off the right-hand side and factor out the v\vec{\mathbf{v}}. So we have a new matrix, (AλI)\left(A - \lambda I\right), and we're looking for a vector v\vec{\mathbf{v}} such that this new matrix times v\vec{\mathbf{v}} gives the zero vector.

Now, this will always be true when v\vec{\mathbf{v}} is the zero vector, but that's boring, we want a non-zero eigenvector.

Zero Determinant

As viewers of chapter 5 and chapter 6 will know, the only way it's possible for the product of a matrix with a nonzero vector to become zero is if the transformation associated with that matrix squishes space into a lower dimension. And that squishification corresponds to a zero determinant of the matrix.

To be concrete, let's say your matrix is [2213]\left[\begin{array}{c} 2 & 2 \\ 1 & 3 \end{array}\right], and imagine subtracting off the variable amount lambda from each diagonal entry.

Imagine tweaking lambda, turning a knob to change its value. As that value of lambda changes, the matrix changes, and so the determinant of that matrix changes.

The goal is to find a value of lambda that will make the determinant zero, meaning the tweaked transformation squishes space. In this case, the sweet spot comes when λ=1\lambda = 1.

Remember that the lambda is tied to the matrix, so if we use another matrix, the lambda might not be 11. To unravel what that means, when λ=1\lambda=1, the matrix AλIA - \lambda \cdot I squishes space onto a line. That means there's a nonzero vector v\mathbf{v} such that (AλI)v=0(A-\lambda\cdot I)\cdot \mathbf{v}=0.

The reason we care about that is because it means Av=λvA \cdot \mathbf{v} = \lambda \cdot \mathbf{v}, which you can read as saying the vector v\mathbf{v} is an eigenvector of AA, staying on its own span during the transformation AA. In this example, the corresponding eigenvalue is 11, so it actually just stays fixed in place.

Av=λvAvλIv=0(AλI)v=0det(AλI)=0\begin{aligned} A \vec{\mathbf{v}} & =\lambda \vec{\mathbf{v}} \\ \rule{0pt}{2em} A \vec{\mathbf{v}}-\lambda I \vec{\mathbf{v}} & =0 \\ \rule{0pt}{2em} (A-\lambda I) \vec{\mathbf{v}} & =0 \\ \rule{0pt}{2em} \operatorname{det}(A-\lambda I) & =0 \end{aligned}

Pause and ponder to make sure that line of reasoning feels good.

Characteristic polynomial

This expression, where we take a given matrix minus lambda times the identity and set its determinant equal to zero, thinking of lambda as a variable, is fundamental to eigenvalues. This is the kind of thing we mentioned in the introduction. If you didn't have a solid grasp of the determinant, and why it relates to linear systems having non-zero solutions, an expression like this would feel completely out of the blue.

Let's revisit the example from the start, with the matrix [3102]\left[\begin{array}{c} 3 & 1\\ 0 & 2 \end{array}\right].

To find if a value λ\lambda is an eigenvalue, subtract it from the diagonals of this matrix, and compute the determinant.

det([3λ102λ])\operatorname{det}\left(\left[\begin{array}{cc}3-\lambda & \mathbb{1} \\ 0 & 2-\lambda\end{array}\right]\right)

Doing this, we get a certain quadratic polynomial in terms of lambda.

det([3λ102λ])=(3λ)(2λ)\operatorname{det}\left(\left[\begin{array}{cc}3-\lambda & \mathbb{1} \\ 0 & 2-\lambda\end{array}\right]\right) = (3 - \lambda)(2- \lambda)

Since lambda can only be an eigenvalue if this determinant is zero, you can conclude that the only eigenvalues are λ=2\lambda = 2 and λ=3\lambda = 3. To see what eigenvectors have one of these eigenvalues, say λ=2\lambda=2, plug in that value of lambda to the matrix, and solve for which vectors this diagonally-altered matrix sends to zero.

[321022][xy]=[00]\left[\begin{array}{cc} 3-2 & 1 \\ 0 & 2-2 \end{array}\right]\left[\begin{array}{l} x \\ y \end{array}\right]=\left[\begin{array}{l} 0 \\ 0 \end{array}\right]

If you computed this, you'd see that the solutions are all of the vectors on the diagonal line spanned by [11]\left[\begin{array}{c} -1 \\ 1 \end{array}\right]. This corresponds to the fact that the unaltered matrix [3102]\left[\begin{array}{c} 3 & 1 \\ 0 & 2 \end{array}\right] stretches all those vectors by a factor of 22.

Rotation

Now, a 2d transformation doesn't have to have eigenvectors. For example, consider a rotation by 9090 degrees. This doesn't have any eigenvectors, since each vector is rotated off of its own span.

If you actually try computing the eigenvalues for a 9090 degree rotation, notice what happens. Its matrix is [0110]\left[\begin{array}{c} 0 & -1 \\ 1 & 0 \end{array}\right]. Subtract λ\lambda from the diagonal elements, and look for when the determinant is 00.

det([λ11λ])=(λ)(λ)(1)(1)=λ2+1=0\begin{aligned} \operatorname{det}\left(\left[\begin{array}{cc}-\lambda & -1 \\ 1 & -\lambda\end{array}\right]\right) & =(-\lambda)(-\lambda)-(-1)(1) \\ \rule{0pt}{2em} & =\lambda^2+1=0 \end{aligned}

In this case, you get the polynomial λ2+1\lambda^2 + 1. The only roots of this polynomial are the imaginary numbers ii and i-i. The fact that there are no real number solutions indicates that there are no eigenvectors.

Shear

Another interesting example is a shear. This fixes ı^\hat{\imath} in place, and moves ȷ^\hat{\jmath} one over, so its matrix is [1101]\left[\begin{array}{c} 1 & 1 \\ 0 & 1\end{array}\right].

All the vectors on the xx-axis are eigenvectors with eigenvalue 11, since they remain fixed in place. In fact these are the only eigenvectors. When you subtract λ\lambda from the diagonals and compute the determinant, you get (1λ)2(1-\lambda)^2, and the only root of that expression is λ=1\lambda = 1.

det([1λ101λ])=(1λ)(1λ)=0λ=1\operatorname{det}\left(\left[\begin{array}{cc} 1-\lambda & 1 \\ 0 & 1-\lambda \end{array}\right]\right)=\underbrace{(1-\lambda)(1-\lambda)=0}_{\lambda=1}

This lines up with what we see geometrically that all the eigenvectors have eigenvalue 11.

Scaling

It's also possible to have just one eigenvalue with more than just a line full of eigenvectors. A simple example is the matrix which scales everything by 22. The only eigenvalue is 22, but every vector in the plane gets to be an eigenvector with that eigenvalue.

Scale everything by 22.

Now is another good time to pause and ponder some of this.

Questions

What is the eigenvalue(s) of the matrix [3223]\left[\begin{array}{cc}3 & 2 \\ 2 & 3\end{array}\right]?

What is the eigenvalue(s) of the matrix [2112]\left[\begin{array}{cc}2 & 1 \\ -1 & 2\end{array}\right]?

What is the eigenvalue(s) of the matrix [2102]\left[\begin{array}{cc}-2 & 1 \\ 0 & -2\end{array}\right]

Working in an eigenbasis

Let's finish off here with the idea of an eigenbasis, which relies heavily on ideas from the last chapter.

Take a look at what happens if our basis vectors happen to be eigenvectors. For example, maybe ı^\hat{\imath} is scaled by 1-1, and ȷ^\hat{\jmath} is scaled by 22. Writing their new coordinates as columns of a matrix, notice that those scalar multiples 1-1 and 22, which are the eigenvalues of ı^\hat{\imath} and ȷ^\hat{\jmath}, are on the diagonal, and all other entries are 00.

Any time a matrix has zeros everywhere other than the diagonal, it's called a "diagonal matrix", and the way to interpret it is that all the basis vectors are eigenvectors, and the diagonal entries of the matrix give you their corresponding eigenvalues.

[5000020000400004]\left[\begin{array}{cccc} -5 & 0 & 0 & 0 \\ 0 & -2 & 0 & 0 \\ 0 & 0 & -4 & 0 \\ 0 & 0 & 0 & 4 \end{array}\right]

There are a number of things that make diagonal matrices nicer to work with. One big one is that it's easier to compute what will happen if you multiply this matrix by itself a whole bunch of times.

[3002][xy]=[3x2y][3002][3x2y]=[32x22y][3002][32x22y]=[33x23y]\begin{aligned} \left[\begin{array}{ll} 3 & 0 \\ 0 & 2 \end{array}\right] \left[\begin{array}{l} x \\ y \end{array}\right] & = \left[\begin{array}{c} 3 x \\ 2 y \end{array}\right] \\ \rule{0pt}{2.5em} \left[\begin{array}{ll} 3 & 0 \\ 0 & 2 \end{array}\right] \left[\begin{array}{c} 3 x \\ 2 y \end{array}\right] & = \left[\begin{array}{c} 3^2 x \\ 2^2 y \end{array}\right] \\ \rule{0pt}{2.5em} \left[\begin{array}{ll} 3 & 0 \\ 0 & 2 \end{array}\right] \left[\begin{array}{c} 3^2 x \\ 2^2 y \end{array}\right] & = \left[\begin{array}{c} 3^3 x \\ 2^3 y \end{array}\right] \\ \rule{0pt}{2.5em} & \ldots \end{aligned}

Since all one of these matrices does is scale each basis vector by some eigenvalue, applying that matrix, say, 100 times will just correspond to scaling each basis vector by the 100th power of the corresponding eigenvalue.

[3002][3002][3002][3002]100 times [xy]=[3100002100][xy]\underbrace{\left[\begin{array}{ll} 3 & 0 \\ 0 & 2 \end{array}\right] \cdots \left[\begin{array}{ll} 3 & 0 \\ 0 & 2 \end{array}\right]\left[\begin{array}{ll} 3 & 0 \\ 0 & 2 \end{array}\right]\left[\begin{array}{ll} 3 & 0 \\ 0 & 2 \end{array}\right]}_{100 \text { times }}\left[\begin{array}{l} x \\ y \end{array}\right] = \left[\begin{array}{cc} 3^{100} & 0 \\ 0 & 2^{100} \end{array}\right]\left[\begin{array}{l} x \\ y \end{array}\right]

By contrast, consider how hard computing the 100th power of a non-diagonal matrix would be. Really, think about it for a moment, compared to simply exponentiating the entries it's a nightmare!

Change of basis

Of course, you will rarely be so lucky as to have your basis vectors also be eigenvectors. But if your transformation has lots of eigenvectors, enough that you can choose a set that spans the full space, then you can change your coordinate system so that these eigenvectors are your basis vectors.

We talked all about change of basis in the last chapter, but let's go through a super quick reminder here of how to express a transformation in a different coordinate system.

Take the coordinates of the vectors you want to use as a basis, which in our cases means two eigenvectors, then make those coordinates the columns of a matrix, known as the change of basis matrix. When you sandwich the original transformation, putting the change of basis matrix on its right and the inverse change of basis matrix on its left, the result will be a matrix representing this same transformation, but from the perspective of the new basis vector's coordinate system.

The whole point of doing this with eigenvectors is that this new matrix is guaranteed to be diagonal, with the corresponding eigenvalues down the diagonal, since this represents working in a coordinate system where basis vectors just get scaled during the transformation.

A set of basis vectors which are all also eigenvectors is called an “eigenbasis”. For example, if you needed to compute the 100th power of this matrix, it would be much easier to change to an eigenbasis, compute the 100th power in that system, then convert back to the standard system. You can't do this with all transformations. A shear, for example, doesn't have enough eigenvectors to span all of space. But if you can find an eigenbasis, it can make matrix operations really lovely.

For anyone willing to work through a neat puzzle to see what this looks like in action, and how it can be used to produce surprising results, we'll put the puzzle prompt below. It's a bit of work, but we think you'll enjoy it.

The next chapter is about a quick trick for computing eigenvalues before the final chapter of the series on abstract vector spaces.

TwitterRedditFacebook
Notice a mistake? Submit a correction on GitHub
Table of Contents