Chapter 13Change of basis

"Mathematics is the art of giving the same name to different things."

- Henri Poincaré.

If we have a vector sitting here in 2d space, we have a standard way to describe it with coordinates. In this case, the vector has coordinates [32]\left[\begin{array}{c} 3 \\ 2 \end{array}\right], which means going from its tail to its tip involves moving 33 units to the right, and 22 units up.

Now, the more linear-algebra oriented way to describe coordinates is to think of each of those numbers as a scalar, a thing that stretches or squishes vectors. You think of the first coordinate as scaling ı^\hat{\imath}, the vector with length one pointing to the right, while the second coordinate scales ȷ^\hat{\jmath}, the vector with length one pointing straight up. The tip-to-tail sum of these two scaled vectors is what the coordinates are meant to describe.

You can think of these two special vectors as encapsulating all the implicit assumptions of our coordinate system: the fact that the first number indicates rightward motion, that the second indicates upward motion, exactly how far a unit of distance is; all of that is tied up in the choice of ı^\hat{\imath} and ȷ^\hat{\jmath} as the vectors which our scalar coordinates are meant scale. The specific way that we translate a vector into a pair of numbers is called a “coordinate system”, and these two special vectors ı^\hat{\imath} and ȷ^\hat{\jmath} are called the “basis” vectors of standard coordinate system.

Alternate system

What we'd like to talk about here is the idea of using a different set of basis vectors. For example, let's say you have a friend, Jennifer, who uses a different set of basis vectors, which I'll call b1\vec{\mathbf{b}_1} and b2\vec{\mathbf{b}_2}. Her first basis vector, b1\vec{\mathbf{b}_1}, points to the right and up a bit, and her second basis vector, b2\vec{\mathbf{b}_2}, points left and up.

Now take another look at the vector we showed earlier, the one we would describe with coordinates [32]\left[\begin{array}{c} 3 \\ 2 \end{array}\right] using our basis vectors ı^\hat{\imath} and ȷ^\hat{\jmath}.

Jennifer would actually describe this vector with coordinates[5/31/3]\left[\begin{array}{c} 5/3 \\ 1/3 \end{array}\right]. What this means is that the particular way to get that vector using her two basis vectors is to scale b1\vec{\mathbf{b}_1} by 5/35/3, scale b2\vec{\mathbf{b}}_2 by 1/31/3, then to add them together.

In a little bit we'll show you how you can figure out those two numbers. In general, whenever Jennifer uses coordinates to describe a vector, she thinks of the first coordinate as scaling b1\vec{\mathbf{b}}_1, and the second coordinate as scaling b2\vec{\mathbf{b}}_2, and she adds the results.

What she gets will typically be completely different from the vector we think of as having those coordinates.

To be a little more precise, her first basis vector, b1\vec{\mathbf{b}}_1, is something we would describe with coordinates [21]\left[\begin{array}{c} 2 \\ 1 \end{array}\right], and her second basis vector, b2\vec{\mathbf{b}}_2, is something we would describe as [11]\left[\begin{array}{c} -1 \\ 1 \end{array}\right].

It's important to realize that from her perspective, in her system, those vectors have coordinates [10]\left[\begin{array}{c} 1 \\ 0 \end{array}\right] and [01]\left[\begin{array}{c} 0 \\ 1 \end{array}\right]. They are what define the meaning of the coordinates [10]\left[\begin{array}{c} 1 \\ 0 \end{array}\right] and [01]\left[\begin{array}{c} 0 \\ 1 \end{array}\right] in her world.

So in effect, we are speaking different languages. We're all looking at the same vectors in space, but Jennifer uses different words and numbers to describe them.

The grid

One quick word on how we're representing things here: When illustrating 2d space, we typically use a square grid square grid. But this grid is just a construct, a way to visualize our coordinate system, so it depends on our choice of basis vectors. Space itself has no intrinsic grid.

Jennifer might draw her own grid, which would be an equally made up construct, meant as nothing more than a visual tool to help follow what her coordinates mean.

Her origin would line up with ours, since everyone agrees what the coordinates [00]\left[\begin{array}{c} 0 \\ 0 \end{array}\right] should mean: It's the thing you get when you scale any vector by 00.

The direction of her axes and the spacing of her grid lines will be different, depending on her choice of basis vectors.

Change of basis matrix

A fairly natural question to ask at this point is how we translate between coordinate systems. For example, if Jennifer describes a vector as [12]\left[\begin{array}{c} -1 \\ 2 \end{array}\right], what would its coordinates be in our system. How do you translate from her language to ours?

Well, what her coordinates are saying is that this vector is 1b1+2b2-1 \vec{\mathbf{b}}_1 + 2 \vec{\mathbf{b}}_2. From our perspective, b1\vec{\mathbf{b}}_1 has coordinates [21]\left[\begin{array}{c} 2 \\ 1 \end{array}\right], and b2\vec{\mathbf{b}}_2 has coordinates [11]\left[\begin{array}{c} -1 \\ 1 \end{array}\right]. When we scale and add these representations of her basis vectors in our coordinates, we get [41]\left[\begin{array}{c} -4 \\ 1 \end{array}\right]. So that's how we would describe the vector she thinks of as [12]\left[\begin{array}{c} -1 \\ 2 \end{array}\right].

This process, of scaling each of her basis vectors by the corresponding coordinates of [12]\left[\begin{array}{c} -1 \\ 2 \end{array}\right] and adding them together might feel familiar: It's matrix-vector multiplication, with a matrix whose columns represent Jennifer's basis vectors.

How this is a transformation

In fact, once you understand matrix-vector multiplication as applying a certain linear transformation, say by watching what might be the most important video in this series, chapter 3, there's a pretty intuitive way to think about what's going on here.

A matrix whose columns represent Jennifer's basis vectors can be thought of as a transformation that moves our basis vectors, ı^\hat{\imath} and ȷ^\hat{\jmath}, the things we think of when we say [10]\left[\begin{array}{c} 1 \\ 0 \end{array}\right] and [01]\left[\begin{array}{c} 0 \\ 1 \end{array}\right], to Jennifer's basis vectors, the things she thinks of when she says [10]\left[\begin{array}{c} 1 \\ 0 \end{array}\right] and [01]\left[\begin{array}{c} 0 \\ 1 \end{array}\right].

For example, let's walk through what it means to take the vector that we think of as having coordinates [12]\left[\begin{array}{c} -1 \\ 2 \end{array}\right], and applying this transformation. Before the linear transformation we're thinking of this vector as a certain linear combination of basis vectors, 1ı^+2ȷ^-1\hat{\imath} + 2\hat{\jmath}.

How we think of [12]\left[\begin{array}{c} -1 \\ 2 \end{array}\right]

The key feature of linear transformations is that the resulting vector will be that same linear combination of the new basis vectors, 1 (the place where ı^ lands) +2 (the place where ȷ^ lands)-1 \text{ (the place where } \hat{\imath} \text{ lands) } + 2 \text{ (the place where } \hat{\jmath} \text{ lands)}.

How Jennifer thinks of [12]\left[\begin{array}{c} -1 \\ 2 \end{array}\right]

What this matrix does is transform our misconception of what Jennifer means into the actual vector that she's referring to.

This can feel kind of backwards. Geometrically this matrix transforms our grid into Jennifer's grid, yet numerically it's translating a vector described in her language to our language.

One way to keep it straight, is to think about how it takes our misconception of what Jennifer means, the vector we get using the same coordinates in our own system, then transforms it into the vector that she really meant, but now expressed using our coordinate system.

When Jennifer thinks of the vector [23]\left[\begin{array}{c} 2 \\ -3 \end{array}\right] in her coordinate system defined by the basis b1=[21]\vec{\mathbf{b}}_1 = \left[\begin{array}{c} 2 \\ 1 \end{array}\right] and b2=[11]\vec{\mathbf{b}}_2 = \left[\begin{array}{c} -1 \\ 1 \end{array}\right] how would we describe the same vector using the standard basis?

Say there is another person named Olive who uses a different basis from both Jennifer's and ours. Instead, they use b1=[23]\vec{\mathbf{b}}_1 = \left[\begin{array}{c} 2 \\ 3 \end{array}\right] and b2=[32]\vec{\mathbf{b}}_2 = \left[\begin{array}{c} -3 \\ 2 \end{array}\right] as their basis vectors. How would be translate the vector that they would describe with the coordinates [12]\left[\begin{array}{c} -1 \\ 2 \end{array}\right] to our coordinate system?

Going from ours to hers

What about if you want to go the other way around? In the example we used earlier in this chapter, we had a vector with coordinates [32]\left[\begin{array}{c} 3 \\ 2 \end{array}\right] in our system.

How do we know that it would have coordinates [5/31/3]\left[\begin{array}{c} 5/3 \\ 1/3 \end{array}\right] in Jennifer's system?

Well, you start with that change of basis matrix that translates Jennifer's language into ours, then you take its inverse. Remember, the inverse of a transformation is a new transformation that corresponds to playing that first one backwards.

In this case, the inverse of the change of basis matrix that has Jennifer's basis as its columns ends up working out to have columns [1/31/3]\left[\begin{array}{c} 1/3 \\ -1/3 \end{array}\right], and [1/32/3]\left[\begin{array}{c} 1/3 \\ 2/3 \end{array}\right].

[2111]1=[1/31/31/32/3]\left[\begin{array}{cc} 2 & -1 \\ 1 & 1 \end{array}\right]^{-1}=\left[\begin{array}{cc} 1 / 3 & 1 / 3 \\ -1 / 3 & 2 / 3 \end{array}\right]

So for example, to see what the vector [32]\left[\begin{array}{c} 3 \\ 2 \end{array}\right] looks like in Jennifer's system, we multiply this inverse change of basis matrix by [32]\left[\begin{array}{c} 3 \\ 2 \end{array}\right], which works out to be [5/31/3]\left[\begin{array}{c} 5/3 \\ 1/3 \end{array}\right].

In practice, especially when you're working in more than two dimensions, you'd use a computer to compute the matrix which represents this inverse, but for a small matrix, like M=[abcd]M = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, the inverse can be computed by swapping along one diagonal, negating the other, and dividing by the determinant.

M1=1det(M)[dbca]M ^{-1} = \frac{1}{\det(M)} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

In Olive's language, defined by the basis b1=[23]\vec{\mathbf{b}}_1 = \left[\begin{array}{c} 2 \\ 3 \end{array}\right] and b2=[32]\vec{\mathbf{b}}_2 = \left[\begin{array}{c} -3 \\ 2 \end{array}\right], how would they describe the vector that we would describe with the coordinates [51]\left[\begin{array}{c} -5 \\ -1 \end{array}\right]?

Translating transformations

So that, in a nutshell, is how to translate the description of individual vectors back and forth between coordinate systems: the matrix whose columns represent Jennifer's basis vectors written in our coordinate system translates vectors written in her language to vectors written in our language.

The inverse of this matrix does the opposite, translating vectors written in our language to vectors written in her language.

However, vectors aren't the only thing we describe using coordinates. For this next part, it's important that you're comfortable representing transformations with matrices, and that you know how matrix multiplication corresponds to composing successive transformations. Definitely pause and take a look at chapter 3 and chapter 4 if any of this feels uneasy.

Follow Rotation

Consider some linear transformation, like a 9090 degree counterclockwise rotation. In our coordinate system, we represent this with the matrix [0110]\left[\begin{array}{c} 0 & -1 \\ 1 & 0 \end{array}\right]. The first column tells us that ı^\hat{\imath} goes to [01]\left[\begin{array}{c} 0 \\ 1 \end{array}\right], and the second column tells us that ȷ^\hat{\jmath} goes to [10]\left[\begin{array}{c} -1 \\ 0 \end{array}\right].

This representation is heavily tied up in our choice of basis vectors from the fact that we're following ı^\hat{\imath} and ȷ^\hat{\jmath} in the first place, to the fact that we record their landing spots in our own coordinate system. How would Jennifer describe a 9090 degree rotation of space?

You might be tempted to just translate the columns of our rotation matrix into Jennifer's language, but this is not quite right. Those columns represent where our basis vectors ı^\hat{\imath} and ȷ^\hat{\jmath} go, but the matrix Jennifer wants should represent where her basis vectors land, and it needs to describe those landing spots in her language.

The process

Here's a common way to think about how this is done: Start with any vector, written in Jennifer's language. Rather than trying to follow what happens to it in terms of her language, first translate it into our language using the change of basis matrix, the one whose columns represent her basis vectors in our language. This gives the same vector written in our language.

Then apply the transformation matrix to it by multiplying on the left. This tells us where that vector lands, but in our language.

As a last step, apply the inverse change of basis matrix, multiplied on the left as usual, to get this transformed vector in Jennifer's language.

Since we could do this to any vector written in her language, first applying the change of basis matrix, then the transformation then the inverse change of basis, that composition of three matrices gives us the transformation matrix in Jennifer's language. It takes in a vector in her language, and spits out the transformed version of that vector in her language.

For this example, where Jennifer's basis vectors look like [21]\left[\begin{array}{c} 2 \\ 1 \end{array}\right] and [11]\left[\begin{array}{c} -1 \\ 1 \end{array}\right] to us, and we're translating a 9090 degree rotation, the product of these three matrices, if you work through it, has columns [1/35/3]\left[\begin{array}{c} 1/3 \\ 5/3 \end{array}\right] and [2/31/3]\left[\begin{array}{c} -2/3 \\ 1/3 \end{array}\right]. So if Jennifer multiplies that matrix by the coordinates of a vector in her system, it will return the 9090 degree rotated version of her vector, expressed in her coordinate system.

In general, whenever you see an expression like A1MAA^{-1}MA, it suggests a mathematical sort of empathy. The middle matrix represents a transformation as you see it, the outer two matrices represent the empathy, this shift in perspective, and the full matrix product represents that same transformation as someone else sees it.

For those of you wondering why we care about using alternate coordinate systems, the next chapter on eigenvectors and eigenvalues will give a really important example of this. See you then!

Notice a mistake? Submit a correction on GitHub
Table of Contents