Artisans with skill in the area of audio data processing utilize a number of existing techniques to modify audio data. Such techniques are used, for example, to introduce sound effects (e.g., adding echoes to a sound track), correct distortions due to faulty recording instruments (e.g., digitally master audio data recorded on old analog recording media), or enhance an audio track by removing noise.
One method to enhance an audio file involves lengthening the audio data. The process of lengthening or time stretching audio data allows users to expand data into places where it would otherwise fall short. For example, if a movie scene requires that an audio track be of a certain duration to fit a timing requirement and the audio track is initially too short, the audio data would need to be lengthened in a way that does not radically distort the sound of that data. Time stretching also provides a way to conceal errors in an audio signal, such as replacing missing or corrupted data with an extension of the audio signal that precedes the gap (or follows the gap).
One way to slow down or speed up playback of an audio track or to take up a longer or shorter duration of time involves changing the speed of playback. However, because sound carries information in the frequency domain, slowing down a waveform results in changing the wavelength of the sound. The human ear perceives such wavelength changes as a change in the pitch. To a listener, that change in the pitch is generally unacceptable.
Existing solutions for lengthening audio data, without modifying the pitch, take segments from within the audio data and insert copies of those segments repeatedly to create a new lengthier audio data.
There are at least two drawbacks to this prior art lengthening approach: 1) the human ear is very sensitive to such audio manipulations as the outcome is perceived as having audible artifacts; and 2) the insertion of segments in the audio data frequently results in producing discontinuities that generate high frequency wave forms which are not adequately filtered by the low-pass filter that is in one way or another present in playback devices. The human ear perceives high-frequency artifacts as clicks. Furthermore, existing techniques require additional manipulations to mask the artifacts introduced by the insertion/repetition techniques. Some of these masking techniques attempt to hide the artifacts by fading the end of the inserted segments. Often, however, the human ear can perceive imperfections, even when masking techniques are applied. A solution that aims at time stretching audio data while preserving the pitch should avoid introducing artifacts through numerical manipulation of the audio data (e.g. numerical filters) to minimize any imperfections perceivable by the human ear.
There is a need for a method and apparatus for modifying the length of an audio track while preserving its audible qualities. Embodiments of the invention provide a method for “time stretching” an audio signal while keeping the pitch unchanged and optimizing the audible qualities.