Advertisement

Dynamic, Sequential Soundtracks for Games

by
Student iconAre you a student? Get a yearly Tuts+ subscription for $45 →

In this tutorial we will be taking a look at one technique for constructing and sequencing dynamic music for games. The construction and sequencing happens at runtime, allowing game developers to modify the structure of the music to reflect what is happening in the game world.

Before we jump into the technical details, you may want to take a look at a working demonstration of this technique in action. The music in the demonstration is constructed from a collection of individual blocks of audio which are sequenced and mixed together at runtime to form the full music track.

Click to view the demo.

Click to view the demo.

This demonstration requires a web browser that supports the W3C Web Audio API and OGG audio. Google Chrome is the best browser to use to view this demonstration with, but Firefox Aurora can also be used.

If you can't view the above demo in your browser, you can watch this YouTube video instead:



Overview

The way this technique works is fairly straightforward, but it has the potential to add some really nice dynamic music to games if it is used creatively. It also allows infinitely long music tracks to be created from a relatively small audio file.

The original music is essentially deconstructed into a collection of blocks, each of which is one bar in length, and those blocks are stored in a single audio file. The music sequencer loads the audio file and extracts the raw audio samples it needs to reconstruct the music. The structure of the music is dictated by a collection of mutable arrays that tell the sequencer when to play the blocks of music.

You can think of this technique as a simplified version of sequencing software such as Reason, FL Studio, or Dance EJay. You can also think of this technique as the musical equivalent of Lego bricks.


Audio File Structure

As mentioned previously, the music sequencer requires the original music to be deconstructed into a collection of blocks, and those blocks need to be stored in an audio file.

This image demonstrates how the blocks might be stored in an audio file.

In that image you can see there are five individual blocks stored in the audio file, and all of the blocks are of equal length. To keep things simple for this tutorial the blocks are all one bar long.

The order of the blocks in the audio file is important because it dictates which sequencer channels the blocks are assigned to. The first block (e.g. drums) will be assigned to the first sequencer channel, the second block (e.g. percussion) will be assigned to the second sequencer channel, and so on.


Sequencer Channels

A sequencer channel represents a row of blocks and contains flags (one for each bar of music) that indicate whether the block assigned to the channel should be played. Each flag is a numerical value and is either zero (do not play the block) or one (play the block).

This image demonstrates the relationship between the blocks and the sequencer channels.

The numbers aligned horizontally along the bottom of the above image represent bar numbers. As you can see, in the first bar of music (01) only the Guitar block will be played, but in the fifth bar (05) the Drums, Percussion, Bass and Guitar blocks will be played.


Programming

In this tutorial we will not step through the code of a full working music sequencer; instead, we will look at the core code required to get a simple music sequencer running. The code will be presented as pseudo-code to keep things as language-agnostic as possible.

Before we begin, you need to bear in mind the programming language that you ultimately decide to use will require an API that allows you to manipulate audio at a low level. A good example of this is the Web Audio API available in JavaScript.

You can also download the source files attached to this tutorial to study a JavaScript implementation of a basic music sequencer that was created as a demonstration for this tutorial.

Quick Recap

We have a single audio file that contains blocks of music. Each block of music is one bar in length, and the order of the blocks in the audio file dictates the sequencer channel the blocks are assigned to.

Constants

There are two pieces of information that we will need before we can proceed. We need to know the tempo of the music, in beats per minute, and the the number of beats in each bar. The latter can be thought of as the time signature of the music. This information should be stored as constant values because it does not change while the music sequencer is running.

TEMPO     = 100 // beats per minute
SIGNATURE = 4   // beats per bar

We also need to know the sample rate that the audio API is using. Typically this will be 44100 Hz, because it is perfectly fine for audio, but some people have their hardware configured to use a higher sample rate. The audio API you choose to use should provide this information, but for the purpose of this tutorial we will assume the sample rate is 44100 Hz.

SAMPLE_RATE = 44100 // Hertz

We can now calculate the sample length of one bar of music - that is, the number of audio samples in one block of music. This value is important because it allows the music sequencer to locate the individual blocks of music, and the audio samples within each block, in the audio file data.

BLOCK_SIZE = floor( SAMPLE_RATE * ( 60 / ( TEMPO / SIGNATURE ) ) )

Audio Streams

The audio API you choose to use will dictate how audio streams (arrays of audio samples) are represented in your code. For example, the Web Audio API uses AudioBuffer objects.

For this tutorial there will be two audio streams. The first audio stream will be read-only and will contain all of the audio samples loaded from the audio file containing the music blocks, this is the "input" audio stream.

The second audio stream will be write-only and will be used to push audio samples to the hardware; this is the "output" audio stream. Each of these streams will be represented as a one-dimensional array.

input  = [ ... ]
output = [ ... ]

The exact process required to load the audio file and extract the audio samples from the file will be dictated by the programming language that you use. With that in mind, we will assume the input audio stream array already contains the audio samples extracted from the audio file.

The output audio stream will usually be a fixed length because most audio APIs will allow you to choose the frequency at which the audio samples need to be processed and sent to the hardware - that is, how often an update function is invoked. The frequency is normally tied directly to the latency of the audio, high frequencies will require more processor power but they result in lower latencies, and vice-versa.

Sequencer Data

The sequencer data is a multi-dimensional array; each sub-array represents a sequencer channel and contains flags (one for each bar of music) that indicate whether the music block assigned to the channel should be played or not. The length of the channel arrays also dictates the length of the music.

channels = [
    [ 0,0,0,0, 0,0,0,0, 1,1,1,1, 1,1,1,1 ], // drums
    [ 0,0,0,0, 1,1,1,1, 1,1,1,1, 1,1,1,1 ], // percussion
    [ 0,0,0,0, 0,0,0,0, 1,1,1,1, 1,1,1,1 ], // bass
    [ 1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1 ], // guitar
    [ 0,0,0,0, 0,0,1,1, 0,0,0,0, 0,0,1,1 ]  // strings
]

The data you see there represents a music structure that is sixteen bars long. It contains five channels, one for each block of music in the audio file, and the channels are in the same order as the blocks of music in the audio file. The flags in the channel arrays let us know whether the block assigned to the channels should be played or not: the value 0 means a block will not be played; the value 1 means a block will be played.

This data structure is mutable, it can be changed at any time even when the music sequencer is running, and this allows you to modify the flags and structure of the music to reflect what is happening in a game.

Audio Processing

Most audio APIs will either broadcast an event to an event handler function, or invoke a function directly, when it needs to push more audio samples to the hardware. This function is usually invoked constantly like the main update loop of a game, but not as frequently, so time should be spent optimizing it.

Basically what happens in this function is:

  1. Multiple audio samples are pulled from the input audio stream.
  2. Those samples are added together to form a single audio sample.
  3. That audio sample is pushed into the output audio stream.

Before we get to the guts of the function, we need to define a couple more variables in the code:

playing  = true // indicates whether the music (the sequencer) is playing
position = 0    // the position of the sequencer playhead, in samples

The playing Boolean simply lets us know whether the music is playing; if it is not playing we need to push silent audio samples into the output audio stream. The position keeps track of where the playhead is within the music, so it's a bit like a scrubber on a typical music or video player.

Now for the guts of the function:

function update() {
    outputIndex = 0
    outputCount = output.length

    if( playing == false ) {
        // silent samples need to be pushed to the output stream
        while( outputIndex < outputCount ) {
            output[ outputIndex++ ] = 0.0
        }
        // the remainder of the function should not be executed
        return
    }

    chnCount = channels.length

    // the length of the music, in samples
    musicLength = BLOCK_SIZE * channels[ 0 ].length

    while( outputIndex < outputCount ) {
        chnIndex = 0

        // the bar of music that the sequencer playhead is pointing at
        barIndex = floor( position / BLOCK_SIZE )

        // set the output sample value to zero (silent)
        output[ outputIndex ] = 0.0

        while( chnIndex < chnCount ) {
            // check the channel flag to see if the block should be played
            if( channels[ chnIndex ][ barIndex ] == 1 ) {
                // the position of the block in the "input" stream
                inputOffset = BLOCK_SIZE * chnIndex

                // index into the "input" stream
                inputIndex = inputOffset + ( position % BLOCK_SIZE )

                // add the block sample to the output sample
                output[ outputIndex ] += input[ inputIndex ]
            }
            chnIndex++
        }

        // advance the playhead position
        position++

        if( position >= musicLength ) {
            // reset the playhead position to loop the music
            position = 0
        }

        outputIndex++
    }
}

As you can see, the code required to process the audio samples is fairly simple, but as this code will be run numerous times a second you should look at ways to optimize the code within the function and pre-calculate as many values as possible. The optimizations that you can apply to the code depend solely on the programming language you use.

Don't forget you can download the source files attached to this tutorial if you want to look at one way of implementing a basic music sequencer in JavaScript using the Web Audio API.


Notes

The format of the audio file you use must allow the audio to loop seamlessly. In other words, the encoder used to generate the audio file should not inject any padding (silent chunks of audio) into the audio file. Unfortunately MP3 and MP4 files cannot be used for that reason. OGG files (used by the JavaScript demonstration) can be used. You could also use WAV files if you wanted to, but they are not a sensible choice for web based games or applications due to their size.

If you are programming a game, and if the programming language you are using for the game supports concurrency (threads or workers) then you may want to consider running the audio processing code in its own thread or worker if it is possible to do so. Doing that will relieve the game's main update loop of any audio processing overhead that may occur.


Dynamic Music in Popular Games

The following is a small selection of popular games that take advantage of dynamic music in one way or another. The implementation these games use for their dynamic music may vary, but the end result is the same: the game's players have a more immersive gaming experience.


Conclusion

So, there you go - a simple implementation of dynamic sequential music that can really enhance the emotive nature of a game. How you decide to use this technique, and how complex the sequencer becomes, is entirely up to you. There are a lot of directions that this simple implementation can take and we will cover some of those directions in a future tutorial.

If you have any questions, please feel free to post them in the comments below and I will get back to you as soon as possible.

Advertisement