3Blue1Brown

Chapter 3Power Rule through geometry

“You know, for a mathematician, he did not have enough imagination. But he has become a poet and now he is fine.” David Hilbert

After introducing the derivative and its relation to rates of change in the last lesson, the next step is to learn how to compute the derivatives of functions that are explicitly given with some formula. A typical calculus student spends quite a bit of time drilling on how to compute derivatives, often without the context of a concrete rate of change problem. For example, a worksheet many ask you to compute the derivative of this function:

f(x)=sin(x)x2f(x) = \frac{\sin(x)}{x^2}

But there may be no indication of what physical process this function describes, or what significance its rate of change has. That's not necessarily a bad thing, it's analogous to how we often learn the multiplication table by drilling on many facts like 7×6=427 \times 6 = 42 without perpetually having to put each such equation in context.

Still, before diving in, it may be worth emphasizing why all these formulas are worth learning in the first place, and why exercises asking students to drill on them are worth the effort. We model many real-world phenomena, especially those in physics, with polynomials, trigonometric functions, exponentials, and the like, so building up a fluency with actually computing derivatives of these functions gives you a language to readily understand the rates of change for these phenomena.

Think of how knowing the multiplication table by heart frees up a student to think about more complicated ideas in arithmetic and algebra.

So in this chapter and the next, the aim is to show how you can think about a few of these rules intuitively and geometrically; and I really want to encourage you to never forget about the tiny nudges at the heart of derivatives.

All that said, even if there's value to drilling on these formulas and memorizing them, the goal of this series is for you to feel like these are facts you could have discovered yourself. So for the next few chapters, let's put ourselves in a mindset of patience and discovery. Discovering these formulas can be a beautiful exercise in creativity, requiring you to sniff out how tiny changes to one quantity influence tiny changes to another.

Even if you don't need to think through these derivations every time you compute a derivative, going through them can reinforce the core idea of what derivatives are all about.

Monomial Terms

Derivative of f(x)=x2f(x) = x^2

Let’s start with a function like f(x)=x2f(x) = x^2. What is its derivative? That is to say, if you look at some value of xx, like x=2x = 2, and compare it to a value slightly bigger, just dxdx bigger, what’s that corresponding change in the value of the function, dfdf?

In particular, what is dfdf divided by dxdx? The rate at which this function changes per unit change in xx? As the first step for intuition, we know you can think of the ratio dfdx\frac{df}{dx} as the slope of a tangent line to the graph of x2x^2. From that, we can see that the slope generally increases as xx increases. At 00, the tangent line is flat, so the slope is 00. At x=1x = 1, it’s something steeper, at x=2x=2, it’s steeper still.

But looking at graphs isn’t generally the best way to understand the precise formula for a derivative. For that, it’s best to take a more literal look at what x2x^2 actually means. In this case, let’s picture a square whose side length is xx.

If you increase xx by a tiny nudge dxdx, what is the resulting change to the area of the square? That slight change in area represents dfdf: the tiny increase in the value of f(x)=x2f(x)=x^2 caused by increasing xx by a tiny nudge dxdx.

There’s three new bits of area in this diagram, two thin rectangles, and a miniscule square. The two thin rectangles have side lengths xx and dxdx, so together they account for 2xdx2 \cdot x \cdot dx units of new area. For example, if xx was 33 and dxdx was 0.010.01, the new area from these thin rectangles would be 230.012 \cdot 3 \cdot 0.01, which is 0.060.06; About 66 times the size of dxdx.

If x=2x=2 and dxdx was 0.010.01, approximately how much bigger would the new area be relative to the size of dxdx?

That little square has area dx2dx^2, but you should think of this as being really tiny; negligibly tiny. For example, if dxdx was 0.010.01, it would be 0.00010.0001. I’m drawing dxdx with a fair bit of width here, so we can see it, but always remember in principle dxdx should be thought of as a truly tiny amount.

Phrased more precisely, our final consideration will always be what happens as the size of this dxdx approaches 00, and as that happens the proportion of this yellow area dfdf which is accounted for by the tiny dx2dx^2 corner will go to 00.

Phrased more precisely, our final consideration will always be what happens as the size of this dxdx approaches 00, and as that happens the proportion of this yellow area dfdf that is accounted for by the tiny dx2dx^2 corner will go to 00. The full unapproximated change dfdf represented by all the yellow area above looks like df=2xdx+(dx)2df = 2x \cdot dx + (dx)^2. So you might begin thinking of the expression for the derivative like this:

dfdx=2xdx+(dx)2dx=2x+dx\frac{df}{dx} = \frac{2x \cdot dx + (dx)^2}{dx} = 2x + dx

Remember, if we're using this "dd" notation, the implicit meaning is that we consider what happens as dx0dx \to 0. So in this case, our final expression would look like this.

dfdx=2x\frac{df}{dx} = 2x

Notice how we could have simply ignored the (dx)2(dx)^2 term since it doesn't get fully canceled out when dividing by dxdx. A good rule of thumb is that you can ignore anything which includes a dxdx raised to a power greater than one; that is, a tiny change squared is a negligible change.

For example, if you were starting at x=3x=3, as you slightly increase xx, what is the rate in change of the area per unit of length added?

Derivative of f(x)=x3f(x) = x^3

Let’s try a different simple function, f(x)=x3f(x) = x^3. This will be the geometric view of what you and I went through algebraically in the last chapter for the function x3x^3. We can think of x3x^3 geometrically as the volume of a cube with side lengths xx.

When you increase xx by a tiny nudge, dxdx, the volume increases as shown in the figure below. That represents all the volume in a cube with side length x+dxx + dx that’s not already in the original cube with side length xx.

Remember that we are interested in what happens as dxdx approaches 00. The length of dxdx is illustrated so big to demonstrate the change in volume it introduces.

This figure shows the increase in volume of a cube when its side length xx is increased by a small nudge dxdx. It’s nice to think of this new volume broken up into multiple components, but almost all of it comes from the three square faces; Or, said a little more precisely, as dxdx approaches 00, those three squares comprise a portion closer and closer to 100% of the new volume.

Each of those thin squares has a volume of x2dxx^2 \cdot dx; the area of the face times the thickness of dxdx, so in total this gives us 3x2dx3x^2dx of volume change. There are some other slivers of volume along the edges, and in the corner, but their volume will be proportional to dx2dx^2, or dx3dx^3, so they can be ignored. Again, this is because ultimately they will be divided by dxdx, and if there’s still any dxdx remaining, these terms won’t survive the process of letting dxdx approach 00.

(x+dx)3=x3+3x2dx+3xdx2+dx3(x + dx)^3 = x^3 + \color{#fc6255} 3x^2 \color{black} dx \color{#AAAAAA} + 3xdx^2 + dx^3

So the derivative of x3x^3, the rate at which x3x^3 changes per unit of change in xx, is 3x23x^2. Looking at the graph, this means the slope of the graph of x3x^3 at each point xx is exactly 3x23x^2.

Graphical intuition with slope can tell us why this derivative is high on the left, 00 at the origin, and high on the right, but just thinking in terms of graphs would not land us on the precise quantity 3x23x^2. For that, we had to take a much more direct look at the actual meaning of the function.

Derivative of f(x)=xnf(x) = x^n

In practice, you wouldn’t necessarily think of the square every time you’re taking a derivative of x2x^2, nor would you necessarily think of the cube when taking a derivative of x3x^3. Instead, thinking like a mathematician, can you generalize this approach to see if a pattern emerges? Can you invent a tool to find the derivative of any polynomial?

Let's look at the pattern of the first three monomial functions, where I've included the monomial of degree one for completeness. So far, each of these monomial functions has had a nice geometric meaning. Nudging the input xx by a small amount dxdx has allowed us to see how the geometry changes and find the deriative.

However, from a geometric perspective, we have hit a roadblock. How do we visualize four dimensions? One path forward is to continue with algebra, using our geometric intuition to inform the base cases. For example, we can expand higher degree monomial expressions, where the input xx have been nudged by a small amount dxdx. This gives us the following series of expressions.

PolynomialExpansion
f(x)=x1f(x) = x^1(x+dx)1=x+1dx(x + dx)^1 = x + \color{#fc6255}{1} \color{black} dx
f(x)=x2f(x) = x^2(x+dx)2=x2+2xdx+dx2(x + dx)^2 = x^2 + \color{#fc6255} 2x \color{black} dx \color{#AAAAAA} + dx^2
f(x)=x3f(x) = x^3(x+dx)3=x3+3x2dx+3xdx2+dx3(x + dx)^3 = x^3 + \color{#fc6255} 3x^2 \color{black} dx \color{#AAAAAA} + 3xdx^2 + dx^3
f(x)=x4f(x) = x^4(x+dx)4=x4+4x3dx+6x2dx2+4xdx3+dx4(x + dx)^4 = x^4 + \color{#fc6255} 4x^3 \color{black} dx \color{#AAAAAA} + 6x^2dx^2 + 4xdx^3 + dx^4
f(x)=x5f(x) = x^5(x+dx)5=x5+5x4dx+10x3dx2+10x2dx3+5xdx4+dx5(x + dx)^5 = x^5 + \color{#fc6255} 5x^4 \color{black} dx \color{#AAAAAA} + 10x^3dx^2 + 10x^2dx^3 + 5xdx^4 + dx^5

Focusing on the rate in change introduced by the very small nudge, highlighted in red, and ignoring expressions containing dx2dx^2, the pattern that emerges is what's known in the business as the "power rule". Given a monomial function raised to some power, nn, applying the rule gives us the derivative of the function.

Power Rule Definition

Even though in practice you will find yourself performing this derivative quickly and symbolically, imagining that exponent hopping down to the front, every now and then it’s nice to step back and remember why this rule works. Not just because it’s pretty, and not just because it helps to remind us that math actually makes sense and isn’t just a pile of formulas to memorize, but because it flexes that very important muscle of thinking about derivatives in terms of tiny nudges.

What is the derivative of the function f(x)=3x2f(x)= 3x^2?

Derivative of f(x)=1xf(x) = \frac{1}{x}

As another example, think of the function f(x)=1xf(x) = \frac{1}{x}. Now, on the one hand, you could blindly try applying the power rule, since 1x\frac{1}{x} is the same as writing x1x^{-1}. That would involve letting that 1-1 hop down to become a coefficient, leaving behind one less than itself in the exponent, 2-2. But let’s have some fun and see if we can reason this geometrically, rather than just plugging it through a formula.

The value 1/x1/x is asking “what number multiplied by xx equals 11”, so here’s how I’d visualize it: Imagine a little rectangular puddle of water in two dimensions with area 11. Let’s say that it’s width is xx, which means its height must be 1/x1/x, since the total area is 11.

For example, if you increase xx to 33, the other side must be squished down to 13\frac{1}{3}. And if x=2x=2, the other side is forced to be 12\frac{1}{2}.

This is a nice way to think about the graph of 1/x1/x, by the way. If you think of the width xx of this puddle in the xyxy-plane, the corresponding output 1/x1/x, the height of the graph above that point, is whatever height the puddle must have to maintain an area of 11.

For the derivative, imagine nudging the input xx up by a value dxdx. How must the height of this rectangle change so that the area remains unchanged at 11? That is, increasing the width by dxdx adds some new area to the right here, so the puddle must decrease in height by some d(1/x)d(1/x) so that the area lost off the top here cancels that out.

You should think of that d(1/x)d(1/x) as being some tiny negative value, since it’s decreasing the height of this rectangle. And once you work out d(1/x)/dxd(1/x)/dx, compare it to what happens if you apply the power rule purely symbolically to x1x^{-1}.

d(1/x)dx=???\frac{d(1 / x)}{d x}= ? ? ?

What is the derivative of the function f(x)=1xf(x) = \frac{1}{x}?

Exercises

Here are a couple questions to test your knowledge of derivatives and the past chapters.

Derivative of f(x)=xf(x) = \sqrt{x}

See if you can reason your way through the derivative of f(x)=xf(x) = \sqrt{x} which also can be written as x12x^{\frac{1}{2}}. By far the easiest way to compute this is to apply the power rule: The exponent hops down as a coefficient, leaving behind 121=12\frac{1}{2} - 1 = -\frac{1}{2} in the exponent.

dfdx=12x1/2\frac{df}{dx} = \frac{1}{2}x^{-1 / 2}

But is this valid? And following our current playful and geometric spirit, is there a way to read what this really means?

Approaching this question with geometry is, by far, overkill. Frankly, it's a bit of a mind warp. But it does offer a satisfying explanation of an otherwise mostly symbolic fact, and more than that it's one more opportunity to flex our muscles in reasoning about how small nudges to one value can affect another.

Unlike in previous problems, dxdx does not represent geometric length, but instead represents area.

Using the diagram above, what is the derivative of the function f(x)=xf(x) = \sqrt{x}?

TwitterRedditFacebook
Notice a mistake? Submit a correction on Github

Thanks

Special thanks to those below for supporting the original video behind this post, and to current patrons for funding ongoing projects. If you find these lessons valuable, consider joining.

Meshal AlshammariAli YahyaCrypticSwarmYu JunShelby DoolittleDave NicponskiDamion KistlerJuan BenetOthman AlikhanMarkus PerssonDan BuchoffDerek DaiJoseph John CoxLuc RitchieGuido GambardellaJerry LingMark GoveaVechtJonathan EppeleShimin KuangRish KundaliaAchille BrightonKirk WerklundRipta PasayFelipe DinizSoufiane Khiatdim85ChrisDavid WyrickRahul SureshLee BurnetteJohn C. VeseyPatrik AgnéAlvin KhaledScienceVRChris WillisMichael RabadiAlexander JudaMads ElvheimJoseph CutlerCurtis MitchellBrightMyles BuckleyAndy PetschOtavio GoodKarthik TSteve MuenchViesulas SliupasSteffen PerschBrendan ShahAndrew McnabMatt ParlmerDan DavisonJose Oscar Mur-MirandaAidan BonehamHenry ReichSean BibbyPaul ConstantineJustin ClarkMohannad ElhamodBen GrangerJeffrey HermanJacob Young

Discussion

Table of Contents

Thanks

Special thanks to those below for supporting the original video behind this post, and to current patrons for funding ongoing projects. If you find these lessons valuable, consider joining.

Meshal AlshammariAli YahyaCrypticSwarmYu JunShelby DoolittleDave NicponskiDamion KistlerJuan BenetOthman AlikhanMarkus PerssonDan BuchoffDerek DaiJoseph John CoxLuc RitchieGuido GambardellaJerry LingMark GoveaVechtJonathan EppeleShimin KuangRish KundaliaAchille BrightonKirk WerklundRipta PasayFelipe DinizSoufiane Khiatdim85ChrisDavid WyrickRahul SureshLee BurnetteJohn C. VeseyPatrik AgnéAlvin KhaledScienceVRChris WillisMichael RabadiAlexander JudaMads ElvheimJoseph CutlerCurtis MitchellBrightMyles BuckleyAndy PetschOtavio GoodKarthik TSteve MuenchViesulas SliupasSteffen PerschBrendan ShahAndrew McnabMatt ParlmerDan DavisonJose Oscar Mur-MirandaAidan BonehamHenry ReichSean BibbyPaul ConstantineJustin ClarkMohannad ElhamodBen GrangerJeffrey HermanJacob Young