Computer Music and the Fourier Transform

A huge array of digital signal processing (DSP) tools depend upon the ability to track the pitch, rhythm, and amplitude of a signal. Commonly used tools such as auto-tune, pitch-shifting, and many types of synthesis all fall into this category. The way that an audio signal is converted from the way that we perceive it (the time domain) into information we can use for analysis (the frequency domain) is actually based in some 19th century mathematics that are incredibly hard to understand for a humble computer musician such as myself. Nevertheless, I’m going to try to explain it.

The tool that makes frequency analysis possible- Fourier Analysis- was originally designed for analyzing heat transfer. In large terms, it takes a general function (think a time versus amplitude graph of a piano sample) and deconstructs it into component parts, which can later be reassembled into a function with a different domain (i.e., the same piano sample, but expressed as frequency versus amplitude graph).

The time-domain signal above, translated to the frequency domain using the Fourier Transform

The reason that acoustic instruments sound different from one another has to do with the physical properties of the instrument. Let’s take for example a clarinet versus a trombone. In both cases the performer breathes air into their instrument. But even the same note played on each sounds markedly different because of the specific ways the air pressure resonates with the material, shape and design of the instrument. Each horn produces different overtones, which are frequencies at multiples of the fundamental frequency. While overtones follow a specific order (this is called the harmonic series: one octave above the fundamental, then an octave and a fifth, etc…), it is the relative volume of each partial in the overtone series that creates the distinctive sound of each instrument.

Any recorded sound can be synthesized as a collection of sine waves (vibrations with no overtones at all). How ‘accurate’ this synthesis sounds to our ears usually depends on the data that we are using to resynthesize the sound. If the sound is a guitarist playing a single note, the low E string, how do we find the combination of overtones to recreate it? Musically this works by breaking down a sample of the guitar note into its constituent frequencies. When the Fourier transform was first introduced in the 19th century, one had to manually check if a specific frequency was present in a function. So if you wanted to know the relative amounts of all frequencies that humans can hear (roughly 20Hz — 20kHz), that would require doing calculations on every bit of data in your signal. Direct usage of this algorithm requires O(N²) operations. That’s a huge amount of computing time, especially for large data sets (as audio signals often are).

Luckily for us, mathematicians and computer engineers have made it possible to reduce the number of operations required for a Direct Fourier Transform down to O(N log N). This process is called the Fast Fourier Transform and it optimizes the DFT by eliminating operations that are superfluous, such as multiplications by 1. Using FFT on our guitar sample, we would expect to see the strongest frequency at 82.4Hz, which is the standard tuning for low E on a guitar. Other information in the graph will tell us the relative strengths of the guitar’s overtones, information we can use to approximate the sound of the guitar from sine tones. This whole process is called ‘analysis-resynthesis’ and it allows us to perform operations on our signal between the two stages. For example, if we wanted to take our low E sample and resynthesize it two octaves above the original, it would be fairly simple to multiply the results of our FFT before resynthesis.

Of course, most audio signals are far more complex than that of a single note on the guitar. Because of this, most computer programs that use the FFT will break the sample into extremely short snippets, sometimes overlapping, and perform the FFT on each of the snippets, adding them back together as a frequency domain signal. Most uses of FFT require conversion of some sort back into the time domain. The most basic example of this is a phase vocoder. If you want to slow down the speed at which the signal plays, you can analyze its frequency and use that map to resynthesize the sound at half speed using a only sine tones. Any time you’ve sped up or slowed down a youtube video, the audio is using some sort of analysis/resynthesis with FFT.

Frequency analysis is an incredibly powerful tool for computer musicians and its applications are too numbered to list here. It is also easily available to most people with a computer. The folks who invented the most widely used algorithm for FFT in the 1960s worked at different companies, so the algorithm became public domain. Ever since, FFT has been a major part of DSP for as long as computers have been in homes. And it has only become more efficient. These days, analysis and resynthesis can happen almost in real time (or O(N log N) of your smallest snippet). People use vocoders and autotune in live performances. It has come a long way from the original calculation-heavy process done by hand hundreds of years ago. What once took ages can now be done by a computer in an amount of time barely perceptible to humans.

Fourier analysis encompasses striking ideas from mathematics, physics, computers, and music. And while it’s a tool I’ve utilized for years in musical settings, I never really thought I would understand it. Since I started learning about coding, however, many concepts in computer music have crystalized for me. While I still do not have the mathematical wherewithal to understand Fourier transform algorithms quite yet, puzzle pieces I’ve never realized were there are appearing and fitting into the bigger picture. There’s a lot more that I need to learn, and I’m excited for it.

image from Wikimedia Commons