The derivative of a function expresses how the output value of the function changes with changes to the input value. For example, if we have a function:
we can ask: given a particular output value of f(x) for a particular input value of x, what function will tell us how much to add to that output value to get the approximate output value of the function given a small change in x? Or in other words, what function will give us the best linear approximation of the original function around a particular value of (x)?
It turns out that for the function f(x) = x2, the derivative (or function giving the best linear approximation) is 2x. We can write:
In other words, for small values of d:
As an example, let's consider the output value of f(x) when x is 1.5 and d is 0.001. At this point, f(x) = 1.52 = 2.25, and the "true" value of f(x + d) is:
The approximate value of f(x + d) calculated from the derivative is:
As you can see, the approximate value of 2.253 calculated from the derivative is very close to the actual value of 2.253001.
To help visualise how we can determine the derivative of a function, we can imagine that function plotted on a graph, where the y coordinate is the output of the function for a given x. In our case, we have the graph:
Then, at a particular point (x, y), the derivative is effectively the ratio of the change in y coordinate as we change the x coordinate by a small amount. We'll actually generalise the equation above slightly, and consider any graph of the form:
Now, as we add a tiny amount, d, to the value of x, the change in the value of y at that point would then be the difference between the y values at x and at x + d. At point x, the y value would be ax2, and at point x + d, the y value would be a(x + d)2, giving a difference of:
So at a given point x, the rate of change in y at that point-- or in effect, a linear approximation of the function at that point-- is this expression divided by d (the amount by which y changes divided by the amount by which we're changing x). Recall that the derivative is the best linear approximation at that point, or in other words the outcome of this division as d approaches zero. But we can't actually divide by zero.
What we can do is simplify this expression to avoid dividing by d:
Now, as d (and hence, ad) approaches zero, the change in ax2 will approach 2ax, and we say that the derivative of the function ax2 is 2ax. In our initial example of f(x) = x2 (where effectively, a = 1), we now see where the derivative f ' (x) = 2x actually comes from.
But it turns out that we don't usually need to work out the derivative of a function "from first principles" in this way. A number of common expressions have known derivatives. For example, for any number raised to a power b (where b is a real number) and multiplied by a constant a:
it turns out that the derivative is:
And if we have a function that consists of various expressions of this form added together, then we can apply the same rule to each term and add them together. So if the function is:
the derivative becomes:
Notice that a constant addition/subtraction such as the -7 here has no effect on the derivative: if you are always subtracting -7 whatever the input value, then that doesn't effect how the output value changes with changes in input value.