Chapter 2The paradox of the derivative

"So far as the theories of mathematics are about reality, they are not certain; so far as they are certain, they are not about reality." - Albert Einstein

The goal here is simple: Explain what a derivative is. Thing is, though, there’s some subtlety to this topic, and some potential for paradoxes if you’re not careful, so the secondary goal is that you have some appreciation for what those paradoxes are and how to avoid them.

It’s common for people to say that the derivative measures “instantaneous rate of change”, but if you think about it, that phrase is actually an oxymoron. Change is something that happens between separate points in time, and when you blind yourself to all but a single instant, there is no more room for change.

Even if the phrase “instantaneous rate of change” doesn’t, strictly speaking, make sense, there is a very real concept which this phrase is meant to invoke. It took quite a bit of cleverness from the inventors of calculus to nail down this idea, which we now call the derivative.

(A few) Inventors of Calculus

Car Moving

As our central example, imagine a car that starts at some point AA, speeds up, then slows to a stop at some point BB, 100100 meters away, all over the course of 1010 seconds.

Position of car between two points at different points in time.

We could graph this motion, letting a vertical axis represent the distance traveled, and a horizontal axis represent time.

Distance Traveled (meters)

At each time tt, represented with a point on the horizontal axis, the height of the graph tells us how far the car has traveled after that amount of time. It’s common to name a distance function like this s(t)s(t). We’d use the letter dd for distance, except that it already has another full time job in calculus.

For example, the height of the graph tells us that after 66 seconds the car has traveled a little more than 7070 meters.

Distance (meters) traveled at t=6t=6.

Initially this curve is quite shallow, since the car is slow at the start. During the first second, the distance traveled by the car hardly changes at all. For the next few seconds, as the car speeds up, the distance traveled in a given second gets larger, corresponding to a steeper slope in the graph. And as it slows towards the end, the curve shallows out again.

This figure highlights the slope of the graph for different intervals of time, where slope is calculated as the rise over run for each interval.

If we were to plot the car’s velocity in meters per second as a function of time, it might look like this green bump shown in the graph below.

Distance and Velocity Graphs

Shortly after time t=0t=0, the velocity is very small. Up to the middle of the journey, the car builds up to some maximum velocity, covering a relatively large distance in each second. Then it slows back down to a speed of 00 meters per second.

Position and Velocity

These two curves are highly related to each other; if you change the specific distance vs. time function, you’ll have some different velocity vs. time function. We want to understand the specifics of this relationship. Exactly how does velocity depend on this distance vs. time function? For example, let’s look at a different distance vs. time function.

In this graph, the car starts from a stopped position, speeds up and slows down back to a stopped position at 55 seconds time. Then the car speeds up and slows down again to a stopped position at 1010 seconds time.

It’s worth taking a moment to think critically about what velocity actually means here. Intuitively, we all know what velocity at a given moment means, it’s whatever the car’s speedometer shows at that moment. And intuitively, it might make sense that velocity should be higher at times when the distance function is steeper; when the car traverses more distance per unit time.

pondering pi creature

For example, given the graph of the car's position, shown below, at what time is the car going fastest? The car is going fastest at t=3t=3

But the funny thing is, velocity at a single moment makes no sense. Or at least if we think of velocity as a change in distance divided by a change in time, then isolating our view to a single moment leaves no room for both these changes. If I show you a picture of a car, a snapshot in an instant, and ask you how fast it’s going, you’d have no way of telling me.

What you need are two points in time to compare, perhaps comparing the distance traveled after 44 seconds to the distance traveled after 55 second. That way, you can take the change in distance over the change in time. For example, the two snapshots in time below show the position of the car.

Using this information, we can calculate the velocity of the car for this time interval.

Velocity     Change in distance  Change in time     (5030) meters (54) seconds \text{Velocity} \implies \frac{\text { Change in distance }}{\text { Change in time }} \implies \frac{(50-30) \text { meters }}{(5-4) \text { seconds }}

That’s what velocity is, the distance traveled over a given amount of time.

So how is it that we’re looking at a function for velocity that only takes in a single value for tt, a single snapshot in time. It’s weird, isn’t it? We want to associate each individual point in time with a velocity, but computing velocity requires comparing two points in time.

If that feels strange and paradoxical, good! You’re grappling with the same conflict that the inventors of calculus did, and if you want a deep understanding of rates of change, not just for a moving car, but for all sorts of scenarios in science, you’ll need a resolution to this apparent paradox.

Rates of Change

First let’s talk about the real world, then we’ll go into a purely mathematical one. Think about how you might build an actual speedometer for a car.

At some point, say 33 seconds into the journey, the speedometer might measure how far the car goes in a very small amount of time, perhaps the distance traveled between 33 seconds and 3.013.01 seconds. Then it would compute the speed in meters per second as that tiny distance, in meters, divided by that tiny time, 0.010.01 seconds.

That is, a physical car can sidestep the paradox by not actually computing speed at a single point in time, and instead computing speed during very small amounts of time.

Velocity     Change in distance  Change in time     (20.2120) meters (3.013) seconds \text{Velocity} \implies \frac{\text { Change in distance }}{\text { Change in time }} \implies\frac{(20.21-20) \text { meters }}{(3.01-3) \text { seconds }}

Let’s call that difference in time “dtdt”, which you might think of as 0.010.01 seconds, and call the resulting difference in distance traveled “dsds”. So the velocity at that point in time is dsds over dtdt, the tiny change in distance over the tiny change in time.

Graphically, imagine zooming in on the point of the distance vs. time graph above t=3t=3. That dtdt is a small step to the right, since time is on the horizontal axis, and that dsds is the resulting change in the height of the graph, since the vertical axis represents distance traveled.

So dsdt\frac{ds}{dt} is the rise-over-run slope between two very close points on the graph.

dsdt= rise run\frac{d s}{d t}=\frac{\text { rise }}{\text{run}}

Of course, there’s nothing special about the value t=3t=3, we could apply this to any other point in time, so we consider this expression dsdt\frac{ds}{dt} to be a function of tt, something where I can give you some time tt, and you can give back to me the value of this ratio at that time; the velocity as a function of time.

Zoomed in image of rise over run at t=6t=6

So for example, when I had the computer draw this bump curve representing the velocity function, the one you can think of as the slope of this distance vs. time function at each point, here’s what I had computer do:

First, I chose some small value for dtdt, like 0.010.01. Then, I had the computer look at many times tt between 00 and 1010, and compute the distance function ss at (t+dt)(t + dt), minus the value of this function at tt. That is, the difference in the distance traveled between the given time tt, and the time 0.010.01 seconds after that. Then divide that difference by the change in time dtdt, and this gives the velocity in meters per second around each point in time.

dsdt(t)=s(t+dt)s(t)dt\frac{d s}{d t}(t)=\frac{s(t+d t)-s(t)}{d t}

With this formula, you can give the computer any curve representing the distance function s(t)s(t), and it can figure out the curve representing the velocity v(t)v(t).

The Paradox

So now would be a good time to pause, reflect, make sure this idea of relating distance to velocity by looking at tiny changes in time dtdt makes sense, because now we’re going to tackle the paradox of the derivative head-on. This idea of dsdt\frac{ds}{dt}, a tiny change in the value of the function ss divided by a tiny change in the input tt, is almost what the derivative is.

Even though our car’s speedometer will look at an actual change in time like 0.010.01 seconds to compute speed, and even though my program here for finding a velocity function given a position function also uses a concrete value of dtdt, in pure math, the derivative is not this ratio dsdt\frac{ds}{dt} for any specific choice of dtdt. It is whatever value that ratio approaches as the choice for dtdt approaches 00.

Tangent Lines

Luckily, there is a really nice visual understanding for what it means to ask what this ratio approaches: For any specific choice of dtdt, this ratio ds/dtds/dt is the slope of a line passing through two points on the graph, known as a secant line.

Well, as dtdt approaches 00, and those two points approach each other, the slope of that secant line approaches the slope of a line tangent to the graph at whatever point tt we’re looking at. So the true, honest to goodness derivative, is not the rise-over-run slope between two nearby points on the graph; it’s equal to the slope of a line tangent to the graph at a single point.

What does it mean if the slope of the tangent line is negative?

Notice what I’m not saying: I’m not saying that the derivative is whatever happens when dtdt is infinitely small, whatever that may mean, nor am I saying that you plug in 00 for dtdt. Following what is a general theme in calculus, the idea is a two step process where you first consider a finitely small change, some actual number like 0.00010.0001, and then you ask what your answer approaches as this number approaches 00.

Even though change in an instant makes no sense, it does make sense to ask what the rate of change across smaller and smaller amounts of time approaches. It’s a sneaky backdoor way to talk reasonably about the rate of change at a single point in time. Isn’t that neat? It’s flirting with the paradox of change in an instant without ever needing to touch it.

What’s even more neat is how this potentially abstract notion of what many different rates of change approach ends up having such a clean and simple geometric meaning. Since the slope of the secant line between two nearby points approaches the slope of the tangent point at one of them as these points get closer together.

Since change in an instant still makes no sense, rather than interpreting the slope of this tangent line as an “instantaneous rate of change”, an alternate notion is to think of it as the best constant approximation for rate of change around a point.

Words on Notation

Throughout this article I’ve been using “dtdt” to refer to a tiny change in tt with some actual size, and “dsds” to refer to the resulting tiny change in ss, which again has an actual size.

But the convention in calculus is that whenever you’re using the letter “dd” like this, you’re announcing that the intention is to eventually see what happens as dtdt approaches 00. For example, the honest-to-goodness derivative of the function s(t)s(t) is written as dsdt\displaystyle \frac{ds}{dt}, even though the derivative is not a fraction, per se; it’s whatever that fraction approaches for smaller and smaller nudges in tt.


A specific example should help here. You might think that asking about what this ratio approaches for smaller and smaller values of dtdt would make it much more difficult to compute, but counterintuitively it can make things easier. Let’s say a given distance vs. time function was exactly t3t^3. So after 11 second, the car has traveled 13=11^3=1 meters, after 22 seconds, it’s traveled 23=82^3=8 meters, and so on. This function is shown below.

This figure graphs the position function s(t)=t3s(t) = t^3 and also highlights the point at t=2t=2.

What I’m about to do might seem somewhat complicated, but once the dust settles it really is simpler, and it’s the kind of thing you only ever have to do once in calculus. Let’s say you want the velocity, ds/dtds/dt, at a specific time, like t=2t=2. And for now, think of dtdt having an actual size; we’ll let it go to 00 in just a bit.

This figure illustrates the approximate velocity at t=2t=2.

The tiny change in distance between 22 seconds and 2+dt2+dt seconds is s(2+dt)s(2)s(2+dt)-s(2), and we divide by dtdt.

dsdt(2)=s(2+dt)s(2)dt\frac{d s}{d t}(2)=\frac{s(2+d t)-s(2)}{d t}

Since s(t)=t3s(t) = t^3, we can apply the definition of the function, so the numerator becomes (2+dt)323(2+dt)^3 - 2^3.

dsdt(2)=(2+dt)3(2)3dt\frac{d s}{d t}(2)=\frac{(2+d t)^{3}-(2)^{3}}{d t}

Now this, we can work out algebraically. And again bear with me, there’s a reason I’m showing you the details. Expanding the top gives the expression:

dsdt(2)=23+322dt+32(dt)2+(dt)323dt\frac{d s}{d t}(2)=\frac{2^3 + 3 \cdot 2^2dt + 3 \cdot 2 \cdot (dt)^2 + (dt)^3 - 2^3}{d t}

Now there are a lot of terms, but it simplifies. Those 232^3 terms cancel out. Everything remaining has a dtdt in it, so we can divide that out. So the ratio ds/dtds/dt has boiled down to 3223 \cdot 2^2 plus two different terms that each have a dtdt in them.

dsdt(2)=3(2)2+3(2)(dt)+(dt)2\frac{d s}{d t}(2) = 3(2)^{2}+3(2)(d t)+(d t)^{2}

So if we ask what happens as dtdt approaches 00, representing the idea of looking at smaller and smaller changes in time, we can ignore those! By eliminating the need to think of a specific dtdt, we’ve eliminated much of the complication in this expression! So what we’re left with is a nice clean 3223 \cdot 2^2. This means the slope of a line tangent to the point at t=2t=2 on the graph of t3t^3 is exactly 3223 \cdot 2^2, or 1212.

Of course, there was nothing special about choosing t=2t=2; more generally we’d say the derivative of t3t^3, as a function of tt, is 3t23 \cdot t^2.

dsdt(t)=3(t)2\frac{d s}{d t}(t)=3(t)^{2}

That’s beautiful. This derivative is a quite complicated idea: We’ve got tiny changes in distance over tiny changes in time, but instead of looking at any specific tiny change in time we start talking about what this thing approaches. Yet we’ve ended up with such a simple expression: 3t23t^2.

What is the velocity of the car at time t=1t=1

In practice, you would not go through all that algebra each time. Knowing that the derivative t3t^3 is 3t23t^2 is one of those things all calculus students learn to do immediately without rederiving each time, as quickly and automatically as you know a simple algebra rule, like x(y+z)=xy+xzx(y + z) = xy + xz. In the next chapter, we’ll see methods for thinking about this and many other derivative formulas in nice geometric ways.

The point I want to make by showing you all of the algebraic guts here is that when you consider the change in distance over a change in time for any specific value of dtdt, something like dt=0.01dt = 0.01, you’d have kind of a mess.

But by considering what this ratio approaches as dtdt approaches 00, it lets you ignore much of that mess, and actually simplifies the problem.

That right there is the heart of why calculus becomes useful.

The Paradox at Time Zero

This example also sets the stage to think more concretely about why the notion of an instantaneous rate of change is paradoxical. Think about this car traveling according to this t3t^3 distance function, and consider its motion at moment t=0t=0. Now ask yourself: Is the car moving at that time?

On the one hand, we can compute its speed at that point using the derivative of this function, 3t23t^2, which is 00 at time t=0t=0.

dsdt(t)=3t2=3(0)2=0ms\frac{d s}{d t}(t)=3 t^{2}=3(0)^{2}=0 \frac{\mathrm{m}}{\mathrm{s}}

Visually, this means the tangent line to the graph at that point is perfectly flat, so the car’s “instantaneous velocity” is 00, which suggests it’s not moving. But on the other hand, if it doesn’t start moving at time 00, when does it start moving? Really, pause and ponder this for a moment, is that car moving at t=0t=0?

Take a moment to unpack what it actually means for the derivative of the distance function to be 00 at this point. It means the best constant approximation for the car’s velocity around that point is 00 meters per second. For example, between t=0t=0 and t=0.1t=0.1 seconds, the car does move, it moves 0.0010.001 meters. That’s very small, and importantly it’s very small compared to the change in time, an average speed of only 0.010.01 meters per second.

For smaller and smaller nudges in time, this ratio of the change in distance over change in time approaches 00, though in this case it never actually hits it. So would you say this qualifies as moving at the time t=0t=0? I would argue the question makes no sense, movement is something which happens between two points in time, so has no meaning in a given instant.

It’s tempting to say the derivative gives this notion meaning, and many people would happily say the car is not moving at t=0t=0, but it is moving for all times t>0t > 0. For my part, I’d recommend not taking the phrase “instantaneous rate of change” too seriously, instead thinking of it as a conceptual shorthand for “the best constant approximation for the rate of change”.

Next Chapter

In the following chapters we’ll dig into how to compute the derivative of various functions. Along the way we’ll see how this fundamental idea of looking at tiny nudges to the output of a function caused by tiny nudges to its input lends itself to many beautiful geometric intuitions.

Notice a mistake? Submit a correction on GitHub
Table of Contents


Special thanks to those below for supporting the original video behind this post, and to current patrons for funding ongoing projects. If you find these lessons valuable, consider joining.

Meshal AlshammariAli YahyaCrypticSwarmYu JunShelby DoolittleDave NicponskiDamion KistlerJuan BenetOthman AlikhanMarkus PerssonDan BuchoffDerek DaiJoseph John CoxLuc RitchieMark GoveaGuido GambardellaVecht Jonathan EppeleShimin KuangRish KundaliaAchille BrightonKirk WerklundRipta PasayFelipe Dinizdim85ChrisJohn C. VeseyPatrikAlvin KhaledScienceVRChris WillisMichael RabadiAlexander JudaMads ElvheimJoseph CutlerCurtis MitchellAri RoyceBrightMyles BuckleyAndy PetschOtavio GoodKarthik TSteve MuenchViesulas SliupasSteffen PerschBrendan ShahAndrew McnabMatt ParlmerNaoki OraiDan DavisonJose Oscar Mur-MirandaAidan BonehamHenry ReichSean BibbyPaul ConstantineJustin ClarkMohannad ElhamodBen GrangerJeffrey HermanJacob Young