Turning Sound into Setting

By Piper Hill


About the Project 

This program is a project for Bob Bosch’s Math Art class at Oberlin College. It can, when given any WAVE-type audio file, write a program that renders, in color, a small 3D model of a city. This is meant to serve as a whimsical visual representation of the song. The goal of the project is to create models that "look like they sound" (i.e. have a noticeably cohesive relationship between the audio and visual).
The Data Mapping Process


A traditional spectrogram uses the x-axis to denote time, the y-axis to denote frequency, and color or grayscale value to indicate the amplitude of each frequency at each time-step, usually to a degree of detail that renders an appearance of continuity. I wanted to make a whimsical version of this that accounts for more factors than just plain-old frequency and amplitude. I wrote code in MATLAB that takes in a sound file and writes an OpenSCAD program that generates these cities.


The Southwest corners of these song-cities act as the origin. If you travel North, you will be traveling into higher frequency bands of the song. If you travel East, you are traveling forwards in the time dimension. There are seven blocks from South to North, representing the number of frequency bands taken into account in the program: Sub-bass (0-60Hz), Bass (60-250Hz), Low Mids (250-500Hz), True Mids (500-2000Hz), High Mids (2000-4000Hz), Presence (4000-6000Hz), and Brilliance (6000-20000Hz).


The number of blocks from West to East represents the length of the piece of music in seconds divided by a variable called BlockSize. It represents the length in seconds that each city block indicates, I define at the top of the program. I change this myself depending on how big I want the city to be considering the length of the music.


Every block is either a building or a park. If it is a building, the average amplitude for that particular frequency-time block is what determines the height of the building, creating a "pixelated" spectrogram of sorts. Each building higher than a set threshold is populated with evenly spaced windows.


For each block there are currently only three main sonic factors being taken into account:

  • Average Amplitude

  • Most Prominent Note

  • Variation Coefficient

Average Amplitude


For each time block, I used an FFT process that results in a filtered audio signal represented by a vector of numbers for each frequency band. So each "block" has its own little clip of filtered audio. I then just took the mean of the absolute value of all of the numbers in each vector to find the average amplitude for that block, which ends up being directly proportional to the building height.

Most Prominent Note


First, I wanted to find the most prominent note for the whole piece, to determine a key of sorts. This doesn’t always end up being the key that the song is actually in, even if it does stay in one main key area.


To find the name of the most prominent note in the piece, I first found the fundamental frequency of the clip and then found the closest note to that frequency (based on A=440). To learn more about the math I used to do this, check out the paper attached farther up. At the beginning of the program, I created a 2 by 12 matrix with color values in it assigned in the order of the notes from C to B (somewhat arbitrarily, just in ways that felt right, and the colors are a bit different from what’s in the attached paper). This becomes the main color of the city.


As the city builds itself, this process happens again for each block. Depending on the “scale degree” the most prominent note of each block represents in relation to the overall most prominent note, the main color is tinted or shaded. Where tonicRGB is the main color of the city and scaleDegreeFactors(closestNote) is the tinting or shading factor I assigned to each scale degree:


if scaleDegreeFactors(closestNote) <= 0

blockRGB = tonicRGB + tonicRGB*scaleDegreeFactors(closestNote);


blockRGB = tonicRGB + ([255,255,255] - tonicRGB)*scaleDegreeFactors(closestNote);


Variation Coefficient


Here, I’m trying to get at a quality of the sound that is a little less straightforward, that more or less encompasses the "amount of audible variation" that is happening in any particular block of filtered audio. If you’re interested in how the variation coefficient is generated, you can check out the PDF I attached farther up


The variation coefficient ends up being an integer, almost always between 0 and 10.

We use the variation coefficient to determine the height of pyramid-shaped roofs for each building. Where variCount is the variation coefficient, pointinessFactor is user-defined at the beginning of the program, and bh is the height of the buildings before a roof is added:


if variCount > 1

roofHeight = pointinessFactor*bh*variCount;


roofHeight = 0;


If the variation coefficient is zero, or the value of bh is less than 2, there is no building and the block is made into a park. If bh is less than 2, there are no trees in the park and it’s just plain green space. Otherwise, if bh is less than 15, there is one tree. If bh is 15 or more, then there are two trees. The trees are just simple brown cylinders with a sphere on top for the leaves, the color of which is determined by whatever color the building would have been if there’d been one.

Future Development


I plan to keep tinkering with this in the future, and I may add things that create little cars or people, or change the way the windows are put on the buildings.