This invention relates to a method and apparatus for changing the speed of playback of a digitised audio signal.
Speech falls within a frequency range between 20 Hz and 4 kHz. According to Nyquist's theorem, an analog signal must be sampled at a rate at least twice that of the highest frequency component of the signal in order to preserve information in the signal. Accordingly, to digitise speech, the analog speech signal is conventionally sampled at the rate of 8 kHz. The analog samples are typically digitally encoded using pulse code modulation (PCM).
Because humans are often able to comprehend at a rate faster than normal human speech, it may be desired to speed up recorded speech during playback. This could be accomplished by simply increasing the rate of playback of PCM samples, however this would raise the pitch of the played back speech. To avoid raising the pitch, it is known to drop groups of PCM samples from a sample stream and playback the remaining samples at the normal rate of 8 kHz. However, this results in clicks in the playback due to the discontinuities between speech samples preceding and following the dropped speech samples.
In U.S. Pat. No. 5,386,493 issued Jan. 31, 1995 to Degen, periodic groups of samples are dropped from a digital sample stream and the resulting gaps removed. Discontinuities at the cut points are avoided by filtering the digital sample stream with an equal-powered cross-fade amplifier/filter. This filter fades out the old segment of samples utilizing a parabolic function while fading in the new segment. With cross-fade, the parabolic functions for each pair of adjacent segments cross at the segment junction (resulting in a cross-over region). This approach requires additional processing power to speed up the speech playback beyond that required to play back the signal at its normal (non-sped up) rate. The amount of additional processing power required becomes significant when the playback speedup is performed as part of a system which is playing back speech which was previously compressed (i.e. stored at a lower bit rate than the original). In this type of system, the need to expand out not only the speech samples in the segments being played, but also the samples in the cross-over region and, for some types of coders which are adaptive and/or differential, the samples in the segments that are dropped, can result in over twice the processing power of normal speed playback in order to double the playback speed.
This invention seeks to overcome drawbacks of prior systems to change the speed of audio playback, especially where there is a need to store the audio to be played back in a compressed format.