Chapter 4Matrix multiplication as composition

"It is my experience that proofs involving matrices can be shortened by 50% if one throws the matrices out."

\qquad — Emil Artin


In the previous lesson we showed how linear transformations are just functions with vectors as inputs and vectors as outputs. However it is often more convenient to think of linear transformations as moving space in a way that keeps gridlines parallel and evenly spaced.

We can determine the entirety of the transformation by where the basis vectors land after the transformation, which form the columns of a matrix. To find where any arbitrary vector lands after the transformation, there is an operation called matrix-vector multiplication. Here is the computation for a 2D matrix and vector:

[abcd][xy]=x[ac]+y[bd]=[ax+bycx+dy]\begin{bmatrix} \color{green}a & \color{red}b \\ \color{green}c & \color{red}d \end{bmatrix} \begin{bmatrix}x\\y\end{bmatrix} = x\begin{bmatrix}\color{green}a\\ \color{green}c\end{bmatrix} +y\begin{bmatrix}\color{red}b\\ \color{red}d\end{bmatrix} = \begin{bmatrix} \color{green}a\color{black}x+\color{red}b\color{black}y \\ \color{green}c\color{black}x+\color{red}d\color{black}y \end{bmatrix}

Composition of Transformations

Oftentimes, you find yourself wanting to describe the effects of applying one linear transformation, then applying another. For example, maybe you want to describe what happens when you rotate the plane 9090^\circ counterclockwise, then to apply a shear. The overall effect here, from start to finish, is another linear transformation, distinct from the rotation and the shear. This new linear transformation is commonly called the “composition” of the two separate transformations we applied.

Like any linear transformation, it can be described with a matrix all of its own by following i^\hat i and j^\hat j.

In this example, the ultimate landing spot for i^\hat i after both transformations is [11]\begin{bmatrix}1\\1\end{bmatrix}, so make this the first column of a matrix. Likewise, j^\hat j ultimately ends up at [10]\begin{bmatrix}-1\\0\end{bmatrix}, so make this the second column of a matrix. This new matrix captures the overall effect of applying a rotation then a shear, but as one single action rather than two successive ones.

Composition is Multiplication

Here’s one way to think about that new matrix. If you were to take some vector and pump it through the rotation then the shear, the long way to compute where it lands by first multiplying on the left by the rotation matrix, then multiplying the result on the left by the shear matrix. This is, numerically speaking, what it means to apply a rotation then a shear to a given vector. But whatever you get should be the same as just multiplying this new composition matrix that we found by the vector, no matter what vector you chose, since this new matrix is supposed to capture the same overall effect as the rotation-then-shear action.

Based on how things are written down here, I think it’s reasonable to call that new matrix the product of the two original matrices, don’t you?

[1101]Shear [0110]Rotation=[1110]Composition\color{purple} \underbrace{ \begin{bmatrix}1&1\\0&1\end{bmatrix} }_{\large \text{Shear}} \ \color{orange} \underbrace{ \begin{bmatrix}0&-1\\1&0\end{bmatrix} }_{\large \text{Rotation}} \color{black} = \color{red} \underbrace{ \begin{bmatrix}1&-1\\1&0\end{bmatrix} }_{\large \text{Composition}}

We can think about how to compute that product more generally in just a moment, but it’s easy to get lost in the forest of numbers. Always remember that multiplying two matrices like this has the geometric meaning of applying one transformation after another.

One thing that’s kind of weird is that this has us reading right to left. You first apply the transformation represented by the matrix on the right, then you apply the transformation represented by the matrix on the left. This stems from function notation, since we write functions on the left of the variable, so composition always has to be read right-to-left. Good news for Hebrew readers, bad news for the rest of us.

f(g(x))Read right to left[1101]Shear [0110]Rotation=[1110]Composition\begin{align*} &\qquad f(g(x)) \\ &\underleftarrow{\text{Read right to left}} \\ &\color{purple} \underbrace{ \begin{bmatrix}1&1\\0&1\end{bmatrix} }_{\large\text{Shear}} \ \color{orange} \underbrace{ \begin{bmatrix}0&-1\\1&0\end{bmatrix} }_{\large\text{Rotation}} \color{black} = \color{red} \underbrace{ \begin{bmatrix}1&-1\\1&0\end{bmatrix} }_{\large\text{Composition}} \end{align*}

Computing the New Matrix

Let’s look at another example. There are two matrices M1M_1 and M2M_2 which perform different transformations:

The total effect of applying M1M1, then M2M2 gives a new transformation, so let’s find its matrix.

But let’s see if we can do it without any images, and instead just using the numerical entries in each matrix.

First we need to figure out where i^\hat i goes. After applying M1M_1, the new coordinates of i^\hat i are given be the first column of M1M_1. To see what happens after applying M2M_2, multiply the matrix for M2M_2 by the first column of M1M_1. Working it out the way I described in the last chapter, you’ll get the vector [21]\begin{bmatrix}2\\1\end{bmatrix}.

M2(M1(i^))=[0210][1210][10]=1[01]+1[20]=[21]\begin{align*} M_2(M_1(\hat i))&= \color{purple} \color{purple} \begin{bmatrix}0&2\\1&0\end{bmatrix} \color{orange} \begin{bmatrix}1&-2\\1&0\end{bmatrix} \color{black} \begin{bmatrix}1\\0\end{bmatrix} \\ &= \color{orange} 1 \color{purple} \begin{bmatrix}0\\1\end{bmatrix} \color{black} + \color{orange} 1 \color{purple} \begin{bmatrix}2\\0\end{bmatrix} \\ &= \color{black} \begin{bmatrix}2\\1\end{bmatrix} \end{align*}

Likewise, we can apply the same operation to find where j^\hat j goes after both transformations.

M2(M1(j^))=[0210][1210][01]=2[01]+0[20]=[02]\begin{align*} M_2(M_1(\hat j))&= \color{purple} \begin{bmatrix}0&2\\1&0\end{bmatrix} \color{orange} \begin{bmatrix}1&-2\\1&0\end{bmatrix} \color{black} \begin{bmatrix}0\\1\end{bmatrix} \\ &= \color{orange} -2 \color{purple} \begin{bmatrix}0\\1\end{bmatrix} \color{black} + \color{orange} 0 \color{purple} \begin{bmatrix}2\\0\end{bmatrix} \\ &= \color{black} \begin{bmatrix}0\\-2\end{bmatrix} \end{align*}

Multiplying these two matrices yields [0210][1210]=?\begin{bmatrix}0&2\\1&0\end{bmatrix}\begin{bmatrix}1&-2\\1&0\end{bmatrix}=?

General Form

Let's go through that same process again, but this time the entries will be variables in each matrix, just to show that the same line of reasoning works for any matrices. This is more symbol heavy, and will require some more room, but it should be satisfying for anyone who has previously been taught matrix multiplication the traditional way.

[abcd]M2 [efgh]M1=[????]M2M1\color{purple} \underbrace{ \begin{bmatrix}a&b\\c&d\end{bmatrix} }_{\large M_2} \ \color{orange} \underbrace{ \begin{bmatrix}e&f\\g&h\end{bmatrix} }_{\large M_1} \color{black} = \color{red} \underbrace{ \begin{bmatrix}?&?\\?&?\end{bmatrix} }_{\large M_2M_1}

To follow where i^\hat i goes, start by looking at the first column of the matrix on the right, since this is where i^\hat i initially lands. Multiplying that column by the matrix on the left is how you can tell where that intermediate version of i^\hat i ends up after applying the second transformation. So the first column of the product matrix will always equal the left matrix times the first column of the right matrix.

[abcd][eg]=e[ac]+g[bd]=[ae+bgce+dg]\color{purple} \begin{bmatrix}a&b\\c&d\end{bmatrix} \color{orange} \begin{bmatrix}e\\g\end{bmatrix} \color{black} = \color{orange} e \color{purple} \begin{bmatrix}a\\c\end{bmatrix} \color{black} + \color{orange} g \color{purple} \begin{bmatrix}b\\d\end{bmatrix} \color{black} = \begin{bmatrix} \color{purple} a \color{orange} e \color{black} + \color{purple} b \color{orange} g \\ \color{purple} c \color{orange} e \color{black} + \color{purple} d \color{orange} g \end{bmatrix}

Similarly, j^\hat j will always initially land on the second column of the right matrix, so multiplying the left matrix by this second column will give the final location of j^\hat j, and hence the second column of the product matrix.

[abcd][fh]=f[ac]+h[bd]=[af+bhcf+dh]\color{purple} \begin{bmatrix}a&b\\c&d\end{bmatrix} \color{orange} \begin{bmatrix}f\\h\end{bmatrix} \color{black} = \color{orange} f \color{purple} \begin{bmatrix}a\\c\end{bmatrix} \color{black} + \color{orange} h \color{purple} \begin{bmatrix}b\\d\end{bmatrix} \color{black} = \begin{bmatrix} \color{purple} a \color{orange} f \color{black} + \color{purple} b \color{orange} h \\ \color{purple} c \color{orange} f \color{black} + \color{purple} d \color{orange} h \end{bmatrix}

It’s common to be taught this formula as something to memorize, along with a certain algorithmic process to remember it.

[abcd]M2 [efgh]M1=[ae+bgaf+bhce+dgcf+dh]M2M1\color{purple} \underbrace{ \begin{bmatrix}a&b\\c&d\end{bmatrix} }_{\large M_2} \ \color{orange} \underbrace{ \begin{bmatrix}e&f\\g&h\end{bmatrix} }_{\large M_1} \color{black} = \color{red} \underbrace{ \begin{bmatrix} \color{purple} a \color{orange} e \color{black} + \color{purple} b \color{orange} g & \color{purple} a \color{orange} f \color{black} + \color{purple} b \color{orange} h \\ \color{purple} c \color{orange} e \color{black} + \color{purple} d \color{orange} g & \color{purple} c \color{orange} f \color{black} + \color{purple} d \color{orange} h \end{bmatrix} }_{\large M_2M_1}

Multiplying these two matrices yields [3125][5373]=?\begin{bmatrix}-3&1\\2&5\end{bmatrix}\begin{bmatrix}5&3\\7&-3\end{bmatrix}=?

I really do believe that before memorizing that process, you should get in the habit of thinking about what matrix multiplication really represents: applying one transformation after another. Trust me, this will give you the conceptual framework that makes the properties of matrix multiplication easier to understand.


Here is an important question: Does it matter what order we put the two matrices in?

M1M2=???M2M1\color{orange}M_1 \color{purple}M_2 \color{black}\stackrel{???}{=} \color{purple}M_2 \color{orange}M_1

Well, let’s think of a simple example like the one from earlier. Take a shear, which fixes i^\hat i and smooshes j^\hat j over to the right, and a 9090^\circ rotation.

The overall effect is very different, so it would appear that order totally does matter! By thinking in terms of transformations, that’s the kind of thing you can do in your head by visualizing, no matrix multiplication necessary!

When multiplying these two matrices [0110][2002]=?[2002][0110]\begin{bmatrix}0&-1\\1&0\end{bmatrix}\begin{bmatrix}-2&0\\0&-2\end{bmatrix}\stackrel{?}{=}\begin{bmatrix}-2&0\\0&-2\end{bmatrix}\begin{bmatrix}0&-1\\1&0\end{bmatrix}


I remember when I first took linear algebra, there was a homework problem that asked to prove that matrix multiplication is associative. That means if you have three matrices, A, B and C, and multiply them all, it shouldn’t matter if you first compute A times B, then multiply the result by C, or if you first multiply B times C, then multiply the result by the matrix A on the left. In other words, does it matter where you put the parentheses.


Now, if you try to work through this numerically, like I did back then, it’s horrible! And unenlightening for that matter. But when you think of matrix multiplication as applying one transformation after another, this property is trivial, can you see why?

What it’s saying is that if you first apply (C then B), then A, it’s the same as applying C, (then B then A). I mean, there’s nothing to prove! You’re just applying three things one after the other, all in the same order!

This might feel like cheating, but it’s not. It’s an honest-to-goodness proof that matrix multiplication is associative. And even better than that, it’s a good explanation for why that property should be true.

Notice a mistake? Submit a correction on GitHub
Table of Contents