With the proliferation of personal computers into the homes of consumers, media activities formerly reserved to professional studios have migrated into the household of the common computer user. One such media activity is the creation and/or modification of audio files (i.e., sound files). For example, sound recordings or synthesized sounds may be combined and altered as desired to create standalone audio performances, soundtracks for movies, voiceovers, special effects, etc.
To synchronize stored sounds, including music audio, with other sounds or with visual media, it is often necessary to alter the tempo (i.e., playback speed) of one or more sounds. Changes in tempo may also need to be made dynamically, during playback, to achieve the desired listening experience. Unfortunately, straightforward approaches to implementing tempo changes, including merely playing the given sound at a faster or slower rate, result in undesired audible side effects such as pitch variation (e.g., the “chipmunk” effect of playing a sound faster) and clicks and pops caused by skips in data as the tempo is changed. These problems may be better understood in the context of an audio file example.
An audio file generally contains a sequence (herein referred to as an “audio sequence”) of digital audio data samples that represent measurements of amplitude at constant intervals (the sample rate). In a computer system, this audio sequence is often represented as an array of data like the following:                SourceAudioData[]={0.0, 0.2, 0.4, 0.3, 0.2, −0.04, −0.15, −0.2, −0.15, −0.05, 0.1, . . . }        
FIGS. 1A–1C show a sound waveform example as might be stored in an audio file. FIG. 1A represents 2000 milliseconds of audio in waveform 100. FIG. 1B represents 200 milliseconds of audio taken from the beginning of waveform 100 and shown in expanded view. FIG. 1C shows 10 milliseconds of audio in an even greater expanded view, showing individual samples associated with waveform 100.
In FIG. 1A, waveform 100 contains ten occurrences of sharp rises in signal value that taper over time. These occurrences are referred to herein as transients and represent distinct sound events, such as the beat of a drum, a note played on a piano, a footstep, or a syllable of a vocalized word. FIG. 1C illustrates how these sound events, or transients, are represented by the sequence of samples stored in an audio file. It should be clear that modifying the sample values or the time-spacing of the samples in FIG. 1C will result in a change in the transient behavior at the level of FIG. 1A, and a corresponding change in the associated sound during playback of the audio sequence.
The resolution of FIG. 1B highlights the periodic nature of waveform 100 during the first transient. The frequency of this periodicity influences the pitch of the sound resulting from that transient. A faster oscillation provides a higher pitched sound, and a slower oscillation provides a lower pitched sound. Also clear from FIG. 1B is the continuous nature of waveform 100. Discontinuities in waveform 100 would be audible on playback as clicks and pops in the audio.
Assuming that waveform 100 represents an adult speaking, if an audio enthusiast attempts to fit the audio sequence into a 1500 millisecond timeslot (e.g., to synchronize the audio sequence with another musical audio sequence) by simply playing back the samples at 4/3 speed, then the result will sound like a child's voice. This occurs because the frequency behavior of the transients speeds up with the playback rate, causing an increase in pitch. This same phenomenon occurs when the incorrect playback speed is selected on a dual-speed tape recorder.
Now assuming that the audio enthusiast only wishes to speed up a portion of the audio file, not only will the pitch change when the speed is changed, but the speed transition will be marked by a click as the continuity of the waveform is temporarily disrupted by the output waveform skipping forward. Neither the pitch change nor the audible clicking are desirable from a listening standpoint, particularly if the audio is to be of professional quality. Clearly, a mechanism is needed for providing tempo (i.e., speed) control without the undesired side effects of pitch variations and audible clicks or pops.