This invention relates to the field of audio signal processing and more specifically, musical signal processing. Time-scaling consists of shortening or lengthening an audio signal while keeping its pitch unchanged. Time-scaling is crucial in many audio applications (e.g. video/audio post-synchronization), and has found its way into several consumer products such as answering systems or voice mail systems. Because they require much less computation power, time-domain techniques are often preferred over frequency-domain techniques, see for example J. Laroche, xe2x80x9cTime and pitch scale modification of audio signalsxe2x80x9d in Applications of Digital Signal Processing to Audio and Acoustics, M. Kahrs and K. Brandenburg, editors, Kluwer, Norwell, Mass., 1998.
For time-domain time scaling techniques, one problem that needed to be solved is the following: time-domain time-scaling systems rely on the very simple idea of repeating (respectively, discarding) segments of the original audio to increase (respectively, decrease) its duration without altering its pitch, a process known as xe2x80x9csplicing.xe2x80x9d When the segments are of an appropriate duration and the splice points are appropriately chosen, the operation of repeating or discarding audio segments can be made relatively inconspicuous, at least for moderate (15%) modification factors. However, two kinds of artifacts are particularly troublesome and difficult to avoid: tempo-modulation and transient-repeating/discarding.
The first artifact, tempo-modulation, comes from the fact that, as the length of the repeated/discarded segments grows larger, the uniformity of tempo in the unmodified signal is lost in the time-scaled signal. For example, a series of metronome clicks becomes irregular after time-scaling, an artifact particularly undesirable for rhythmic music, where tempo accuracy is essential. Reducing the duration of the repeated/discarded segments helps reduce this problem. Unfortunately, as the duration of the repeated/discarded segments becomes smaller, other types of artifacts come into play, such as warbling (an undesirable tremolo heard in sustained pitched sounds). Moreover, for pitched sounds, the length of the repeated/discarded segments should ideally be a multiple of the pitch period (to avoid warbling artifacts), which makes it impossible to make the segments arbitrarily small, and therefore prevents us from reducing tempo-modulation to an acceptable level.
The second artifact, transient-repeating/discarding, comes from the fact that some repeated/discarded segments might fall in the vicinity of a transient (a piano onset or a drum hit) in the original signal. As a result, this transient will be heard as a pair of closely spaced transients if the signal is time-stretched, a very undesirable artifact, or might altogether disappear if the signal is time-compressed. Using short segment durations helps reduce this problem, but cannot entirely avoid it.
By comparison, frequency-domain techniques do not exhibit the problem of tempo-modulation because the time-scaling operation is uniformly distributed along the duration of the signal (as opposed to lumped at certain splicing-instants in time-domain techniques). However, they exhibit a problem similar to transient-repeating/discarding, usually referred to as xe2x80x9ctransient-smearing.xe2x80x9d Percussive transients in frequency-domain time-scaled signals become smeared in time and lose their original sharpness.
According to one aspect of the invention, it possible to perform time-scaling on an audio signal while alleviating most of the artifacts encountered in standard time-scaling techniques. The process according to one aspect is based on a preliminary transient-detection stage and solves all the above problems at the same time. Because the transient locations are known in advance, it becomes possible to control with an arbitrary degree of accuracy where the transients will fall in the time-scaled signal, thus entirely avoiding the problem of tempo-modulation. Furthermore, it becomes possible to xe2x80x9cprotectxe2x80x9d the transients by defining a small area around each transient and making sure that repeated/discarded segments will not overlap with these protected areas in time-domain techniques, or that no time-scaling is performed on the protected areas in frequency-domain techniques.
According to a further aspect of the invention, transients in an audio signal are determined by comparing frequency characteristic energy for different windows of the audio signal. A level curve has values indicating increasing energy in succeeding windows. Peaks on the level curve indicate transients.
According to another aspect of the invention, time scaling is performed only on intervals located between transients. This time scaling may be performed in the time or frequency domains.
According to a further aspect of the invention, in time-domain processing splicing is performed on an interval between transients to modify the length of the interval.
According to a further aspect of the invention, in frequency-domain processing protected areas around each transient are subtracted from an interval between transients and a modified scaling factor is calculated to be used during frequency-domain processing.
Other features and advantages will be apparent in view of the following detailed description and appended claims.