# But what is a Fourier series? From heat flow to circle drawings | DE4

Here, we look at the math behind an animation
like this, what’s known as a “complex Fourier series”. Each little vector is rotating
at some constant integer frequency, and when you add them all together, tip to tail, they
draw out some shape over time. By tweaking the initial size and angle of each vector,
we can make it draw anything we want, and here you’ll see how. Before diving in, take a moment to linger
on just how striking this is. This particular animation has 300 rotating arrows in total.
Go full screen for this is you can, the intricacy is worth it. Think about this, the action
of each individual arrow is perhaps the simplest thing you could imagine: Rotation at a steady
rate. Yet the collection of all added together is anything but simple. The mind-boggling
complexity is put into even sharper focus the farther we zoom in, revealing the contributions
of the littlest, quickest arrows. Considering the chaotic frenzy you’re looking
at, and the clockwork rigidity of the underlying motions, it’s bizarre how the swarm acts
with a kind of coordination to trace out some very specific shape. Unlike much of the emergent
complexity you find elsewhere in nature, though, this is something we have the math to describe
and to control completely. Just by tuning the starting conditions, nothing more, you
can make this swarm conspire in all the right ways to draw anything you want, provided you
have enough little arrows. What’s even crazier, as you’ll see, is the ultimate formula for
all this is incredibly short. Often, Fourier series are described in terms
of functions of real numbers being broken down as a sum of sine waves. That turns out
to be a special case of this more general rotating vector phenomenon that we’ll build
up to, but it’s where Fourier himself started, and there’s good reason for us to start
the story there as well. Technically, this is the third video in a
sequence about the heat equation, what Fourier was working on when he developed his big idea.
I’d like to teach you about Fourier series in a way that doesn’t depend on you coming
from those chapters, but if you have at least a high-level idea of the problem form physics
which originally motivated this piece of math, it gives some indication for how unexpectedly
far-reaching Fourier series are. All you need to know is that we had this equation,
describing how the temperature on a rod will evolve over time (which incidentally also
describes many other phenomena unrelated to heat), and while it’s hard to directly use
it to figure out what will happen to an arbitrary heat distribution, there’s a simple solution
if that initial function looks like a cosine wave with a frequency tuned to make it flat
at each endpoint. Specifically, as you graph what happens over time, these waves simply
get scaled down exponentially, with higher frequency waves decaying faster. The heat equation happens to be what’s known
in the business as a “linear” equation, meaning if you know two solutions and you
add them up, that sum is also a new solution. You can even scale them each by some constant,
which gives you some dials to turn to construct a custom function solving the equation. This is a fairly straightforward property
that you can verify for yourself, but it’s incredibly important. It means we can take
our infinite family of solutions, these exponentially decaying cosine waves, scale a few of them
by some custom constants of our choosing, and combine them to get a solution for a new
tailor-made initial condition which is some combination of cosine waves. Something important I want you to notice about
combining the waves like this is that because higher frequency ones decay faster, this sum
which you construct will smooth out over time as the high-frequency terms quickly go to
zero, leaving only the low-frequency terms dominating. So in some sense, all the complexity
in the evolution that the heat equation implies is captured by this difference in decay rates
for the different frequency components. It’s at this point that Fourier gains immortality.
I think most normal people at this stage would say “well, I can solve the heat equation
when the initial temperature distribution happens to look like a wave, or a sum of waves,
but what a shame that most real-world distributions don’t at all look like this!” For example, let’s say you brought together
two rods, each at some uniform temperature, and you wanted to know what happens immediately
after they come into contact. To make the numbers simple, let’s say the temperature
of the left rod is 1 degree, and the right rod is -1 degree, and that the total length
L of the combined rod is 1. Our initial temperature distribution is a step function, which is
so obviously different from sine waves and sums of sine waves, don’t you think? I mean,
it’s almost entirely flat, not wavy, and for god’s sake, it’s even discontinuous! And yet, Fourier thought to ask a question
which seems absurd: How do you express this as a sum of sine waves? Even more boldly,
how do you express any initial temperature distribution as a sum of sine waves? And it’s more constrained than just that!
You have to restrict yourself to adding waves which satisfy a certain boundary condition,
which as we saw last video means working only with these cosine functions whose frequencies
are all some whole number multiple of a given base frequency. (And by the way, if you were working with
a different boundary condition, say that the endpoints must stay fixed, you’d have a
different set of waves at your disposal to piece together, in this case simply replacing
the cosine functions with sines) It’s strange how often progress in math
looks like asking a new question, rather than simply answering an old one. Fourier really does have a kind of immortality,
with his name essentially synonymous with the idea of breaking down functions and patterns
as combinations of simple oscillations. It’s really hard to overstate just how important
and far-reaching that idea turned out to be, well beyond anything Fourier could have imagined.
And yet, the origin of all this is in a piece of physics which upon first glance has nothing
to do with frequencies and oscillations. If nothing else this should give a hint and how
generally applicable Fourier series are. “Now hang on,” I hear some of you saying,
“none of these sums of sine waves being shown are actually the step function.” It’s
true, any finite sum of sine waves will never be perfectly flat (except for a constant function),
nor discontinuous. But Fourier thought more broadly, considering infinite sums. In the
case of our step function, it turns out to be equal to this infinite sum, where the coefficients
are 1, -⅓, +⅕, -1/7 and so on for all the odd frequencies, all rescaled by 4/pi.
I’ll explain where these numbers come from in a moment. Before that, I want to be clear about what
we mean with a phrase like “infinite sum”, which runs the risk of being a little vague.
Consider the simpler context of numbers, where you could say, for example, this infinite
sum of fractions equals pi / 4. As you keep adding terms one-by-one, at all times what
you have is rational; it never actually equals the irrational pi / 4. But this sequence of
partial sums approaches pi / 4. That is to say, the numbers you see, while never equal
to pi / 4, get arbitrarily close to that value, and stay arbitrarily close to that value.
That’s a mouthful, so instead we abbreviate and say the infinite sum “equals” pi / 4. With functions, you’re doing the same thing
but with many different values in parallel. Consider a specific input, and the value of
all these scaled cosine functions for that input. If that input is less than 0.5, as
you add more and more terms, the sum will approach 1. If that input is greater than
0.5, as you add more and more terms it would approach -1. At the input 0.5 itself, all
the cosines are 0, so the limit of the partial sums is 0. Somewhat awkwardly, then, for this
infinite sum to be strictly true, we do have to prescribe the value of the step function
at the point of discontinuity to be 0. Analogous to an infinite sum of rational number
being irrational, the infinite sum of wavy continuous functions can equal a discontinuous
flat function. Limits allow for qualitative changes which finite sums alone never could. There are multiple technical nuances I’m
sweeping under the rug here. Does the fact that we’re forced into a certain value for
the step function at its point of discontinuity make any difference for the heat flow problem?
For that matter what does it really mean to solve a PDE with a discontinuous initial condition?
Can we be sure the limit of solutions to the heat equation is also a solution? Do all functions
have a Fourier series like this? These are exactly the kind of question real analysis
is built to answer, but it falls a bit deeper in the weeds than I think we should go here,
so I’ll relegate that links in the video’s description. The upshot is that when you take the heat
equation solutions associated with these cosine waves and add them all up, all infinitely
many of them, you do get an exact solution describing how the step function will evolve
over time. The key challenge, of course, is to find these
coefficients? So far, we’ve been thinking about functions with real number outputs,
but for the computations I’d like to show you something more general than what Fourier
originally did, applying to functions whose output can be any complex number, which is
where those rotating vectors from the opening come back into play. Why the added complexity? Aside from being
more general, in my view the computations become cleaner and it’s easier to see why
they work. More importantly, it sets a good foundation for ideas that will come up again
later in the series, like the Laplace transform and the importance of exponential functions.
The relation between cosine decomposition and rotating vector decomposition
We’ll still think of functions whose input is some real number on a finite interval,
say the one from 0 to 1 for simplicity. But whereas something like a temperature function
will have an output confined to the real number line, we’ll broaden our view to outputs
anywhere in the two-dimensional complex plane. You might think of such a function as a drawing,
with a pencil tip tracing along different points in the complex plane as the input ranges
from 0 to 1. Instead of sine waves being the fundamental building block, as you saw at
the start, we’ll focus on breaking these functions down as a sum of little vectors,
all rotating at some constant integer frequency. Functions with real number outputs are essentially
really boring drawings; a 1-dimensional pencil sketch. You might not be used to thinking
of them like this, since usually we visualize such a function with a graph, but right now
the path being drawn is only in the output space. When we do the decomposition into rotating
vectors for these boring 1d drawings, what will happen is that all the vectors with frequency
1 and -1 will have the same length, and they’ll be horizontal reflections of each other. When
you just look at the sum of these two as they rotate, that sum stays fixed on the real number
line, and oscillates like a sine wave. This might be a weird way to think about a sine
wave, since we’re used to looking at its graph rather than the output alone wandering
on the real number line. But in the broader context of functions with complex number outputs,
this is what sine waves look like. Similarly, the pair of rotating vectors with frequency
2, -2 will add another sine wave component, and so on, with the sine waves we were looking
at earlier now corresponding to pairs of vectors rotating in opposite directions. So the context Fourier originally studied,
breaking down real-valued functions into sine wave components, is a special case of the
more general idea with 2d-drawings and rotating vectors. At this point, maybe you don’t trust me
that widening our view to complex functions makes things easier to understand, but bear
with me. It really is worth the added effort to see the fuller picture, and I think you’ll
be pleased by how clean the actual computation is in this broader context. You may also wonder why, if we’re going
to bump things up to 2-dimensions, we don’t we just talk about 2d vectors; What’s the
square root of -1 got to do with anything? Well, the heart and soul of Fourier series
is the complex exponential, e^{i * t}. As the value of t ticks forward with time, this
value walks around the unit circle at a rate of 1 unit per second. In the next video, you’ll see a quick intuition
for why exponentiating imaginary numbers walks in circles like this from the perspective
of differential equations, and beyond that, as the series progresses I hope to give you
some sense for why complex exponentials are important. You see, in theory, you could describe all
of this Fourier series stuff purely in terms of vectors and never breathe a word of i.
The formulas would become more convoluted, but beyond that, leaving out the function
e^x would somehow no longer authentically reflect why this idea turns out to be so useful
for solving differential equations. For right now you can think of this e^{i t} as a notational
shorthand to describe a rotating vector, but just keep in the back of your mind that it’s
more significant than a mere shorthand. I’ll be loose with language and use the
words “vector” and “complex number” somewhat interchangeably, in large part because
thinking of complex numbers as little arrows makes the idea of adding many together clearer. Alright, armed with the function e^{i*t},
let’s write down a formula for each of these rotating vectors we’re working with. For
now, think of each of them as starting pointed one unit to right, at the number 1. The easiest vector to describe is the constant
one, which just stays at the number 1, never moving. Or, if you prefer, it’s “rotating”
at a frequency of 0. Then there will be a vector rotating 1 cycle every second which
we write as e^{2pi * i * t}. The 2pi is there because as t goes from 0 to 1, it needs to
cover a distance of 2pi along the circle. In what’s being shown, it’s actually 1
cycle every 10 seconds so that things aren’t too dizzying, but just think of it as slowed
down by a factor of 10. We also have a vector rotating at 1 cycle
per second in the other direction, e^{negative 2pi * i * t}. Similarly, the one going 2 rotations
per second is e^{2 * 2pi * i * t}, where that 2 * 2pi in the exponent describes how much
distance is covered in 1 second. And we go on like this over all integers, both positive
and negative, with a general formula of e^{n * 2pi * i * t} for each rotating vector. Notice, this makes it more consistent to write
the constant vector is written as e^{0 * 2pi * i * t}, which feels like an awfully complicated
to write the number 1, but at least then it fits the pattern. The control we have, the set of knobs and
dials we get to turn, is the initial size and direction of each of these numbers. The
way we control that is by multiplying each one by some complex number, which I’ll call
c_n. For example, if we wanted that constant vector
not to be at the number 1, but to have a length of 0.5, we’d scale it by 0.5. If we wanted
the vector rotating at one cycle per second to start off at an angle of 45o, we’d multiply
it by a complex number which has the effect of rotating it by that much, which you might
write as e^{pi/4 * i}. If it’s initial length needed to be 0.3, the coefficient would be
0.3 times that amount. Likewise, everyone in our infinite family
of rotating vectors has some complex constant being multiplied into it which determines
its initial angle and magnitude. Our goal is to express any arbitrary function f(t),
say this one drawing an eighth note, as a sum of terms like this, so we need some way
to pick out these constants one-by-one given data of the function. The easiest one is the constant term. This
term represents a sort of center of mass for the full drawing; if you were to sample a
bunch of evenly spaced values for the input t as it ranges from 0 to 1, the average of
all the outputs of the function for those samples will be the constant term c_0. Or
more accurately, as you consider finer and finer samples, their average approaches c_0
in the limit. What I’m describing, finer and finer sums of f(t) for sample of t from
the input range, is an integral of f(t) from 0 to 1. Normally, since I’m framing this
in terms of averages, you’d divide this integral by the length of the interval. But
that length is 1, so it amounts to the same thing. There’s a very nice way to think about why
this integral would pull out c0. Since we want to think of the function as a sum of
these rotating vectors, consider this integral (this continuous average) as being applied
to that sum. This average of a sum is the same as a sum over the averages of each part;
you can read this move as a subtle shift in perspective. Rather than looking at the sum
of all the vectors at each point in time, and taking the average value of the points
they trace out, look at the average value for each individual vector as t goes from
0 to 1, and add up all these averages. But each of these vectors makes a whole number
of rotations around 0, so its average value as t goes from 0 to 1 will be 0. The only
exception is that constant term; since it stays static and doesn’t rotate, it’s
average value is just whatever number it started on, which is c0. So doing this average over
the whole function is sort of a way to kill all terms that aren’t c0. But now let’s say you wanted to compute
a different term, like c_2 in front of the vector rotating 2 cycles per second. The trick
is to first multiply f(t) by something which makes that vector hold still (sort of the
mathematical equivalent of giving a smartphone to an overactive child). Specifically, if
you multiply the whole function by e^{negative 2 * 2pi*i * t}, think about what happens to
each term. Since multiplying exponentials results in adding what’s in the exponent,
the frequency term in each of the exponents gets shifted down by 2. So now, that c_{-1} vector spins around -3
times, with an average of 0. The c_0 vector, previously constant, now rotates twice as
t ranges from 0 to 1, so its average is 0. And likewise, all vectors other than the c_2
term make some whole number of rotations, meaning they average out to 0. So taking the
average of this modified function, all terms other than the second one get killed, and
we’re left with c_2. Of course, there’s nothing special about
2 here. If we replace it with any other n, you have a formula for any other term c_n.
Again, you can read this expression as modifying our function, our 2d drawing, so as to make
the n-th little vector hold still, and then performing an average so that all other vectors
get canceled out. Isn’t that crazy? All the complexity of this decomposition as a
sum of many rotations is entirely captured in this expression. So when I’m rendering these animations,
that’s exactly what I’m having the computer do. It treats this path like a complex function,
and for a certain range of values for n, it computes this integral to find each coefficient
c_n. For those of you curious about where the data for the path itself comes from, I’m
going the easy route having the program read in an svg, which is a file format that defines
the image in terms of mathematical curves rather than with pixel values, so the mapping
f(t) from a time parameter to points in space basically comes predefined. In what’s shown right now, I’m using 101
rotating vectors, computing values of n from -50 up to 50. In practice, the integral is
computed numerically, basically meaning it chops up the unit interval into many small
pieces of size delta-t and adds up this value f(t)e^{-n * 2pi * i * t} * delta-t for each
one of them. There are fancier methods for more efficient numerical integration, but
that gives the basic idea. After computing these 101 values, each one
determines an initial position for the little vectors, and then you set them all rotating,
adding them all tip to tail, and the path drawn out by the final tip is some approximation
of the original path. As the number of vectors used approaches infinity, it gets more and
more accurate. Relation to step function
To bring this all back down to earth, consider the example we were looking at earlier of
a step function, which was useful for modeling the heat dissipation between two rods of different
temperatures after coming into contact. Like any real-valued function, and step function
is like a boring drawing confined to one-dimension. But this one is and especially dull drawing,
since for inputs between 0 and 0.5, the output just stays static at the number 1, and then
it discontinuously jumps to -1 for inputs between 0.5 and 1. So in the Fourier series
approximation, the vector sum stays really close to 1 for the first half of the cycle,
then really quickly jumps to -1 for the second half. Remember, each pair of vectors rotating
in opposite directions correspond to one of the cosine waves we were looking at earlier. To find the coefficients, you’d need to
compute this integral. For the ambitious viewers among you itching to work out some integrals
by hand, this is one where you can do the calculus to get an exact answer, rather than
just having a computer do it numerically for you. I’ll leave it as an exercise to work
this out, and to relate it back to the idea of cosine waves by pairing off the vectors
rotating in opposite directions. For the even more ambitious, I’ll also leave
another exercises up on screen on how to relate this more general computation with what you
might see in a textbook describing Fourier series only in terms of real-valued functions
with sines and cosines. By the way, if you’re looking for more Fourier
series content, I highly recommend the videos by Mathologer and The Coding Train on the
topic, and the blog post by Jezzamoon. So on the one hand, this concludes our discussion
of the heat equation, which was a little window into the study of partial differential equations. But on the other hand, this foray into Fourier
series is a first glimpse at a deeper idea. Exponential functions, including their generalization
into complex numbers and even matrices, play a very important role for differential equations,
especially when it comes to linear equations. What you just saw, breaking down a function
as a combination of these exponentials, comes up again in different shapes and forms.