The present disclosure relates to audio playback devices and systems, and to audio playback algorithms employed in conjunction with such devices.
Audio information can be detected as an analog signal and can be represented using an infinite number of electrical signal values. An analog audio signal is subject to electrical signal impairments, however, that can negatively affect the quality of the recorded information. For example, any change to an analog signal value can result in a defect that will be noticeable when the audio signal is played back, such as distortion and noise. Moreover, because an analog audio signal can be represented using an infinite number of electrical signal values, it is difficult to detect and correct defects, such as those that occur during transmission. Many of the problems associated with the use of analog audio signals can be overcome, without a significant loss of information, simply by digitizing the audio signals.
FIG. 1 presents a portion of an analog audio signal 10. The amplitude of the analog audio signal 10 is shown with respect to the vertical axis 12 and the horizontal axis 14 indicates time. In order to digitize the analog audio signal 10, the waveform 16 is sampled at periodic intervals, such as at a first sample point 18 and a second sample point 20. A sample value representing the amplitude of the waveform 16 is recorded for each sample point. The waveform 16 must be sampled at an appropriate rate to avoid losing information that is needed to represent the analog audio signal 10 with adequate precision. As such, the waveform 16 must be sampled at a rate that is greater than twice the highest frequency present in the analog audio signal 10, which is known as the Nyquist frequency.
The human ear generally cannot detect frequencies greater than 16-20 kHz, so the sampling rate used to create an accurate representation of an acoustic signal should be at least 32 kHz. For example, compact disc quality audio signals are generated using a sampling rate of 44.1 kHz. Once the sample value associated with a sample point has been determined, it is then represented using a fixed number of binary digits. Encoding the infinite possible values of an analog audio signal using a finite number of binary digits will almost necessarily result in the loss of some information. Because high-quality audio is encoded using up to 24-bits per sample, however, the digitized values closely approximate the analog values. The digitized values of the samples comprising the audio signal are then stored electronically using a digital-audio file format.
The acceptance of digital-audio has increased dramatically as the amount of information that is shared electronically has grown. Digital-audio file formats, such as MP3 (MPEG Audio-layer 3) and WAV, that can be transferred between a wide variety of hardware devices are now widely used. In addition to music and soundtracks associated with video information, digital-audio is also being used to store information such as voice-mail messages, audio books, speeches, lectures, and instructions.
The characteristics of digital-audio and the associated file formats also can be used to provide greater functionality in manipulating audio signals than was previously available with analog formats. For example, exact duplicates of a digital-audio file can be produced almost instantaneously. Further, any point in a digital-audio signal can be randomly accessed during playback, permitting the implementation of functions such as seek and random play.
It is also much easier to process digital information and thus to manipulate one or more characteristics associated with a digital audio signal. For example, an audio signal is associated with a first rate when it is recorded. This rate can be represented numerically as 1.0. The audio signal, however, can be played back at a second rate that is faster than the first rate. The faster rate can be represented using a number with a higher value, such as 1.1. The audio signal also can be played back at a third rate that is slower than the first rate and can be represented using a number with a lower value, such as 0.9. Playback at a rate other than the first rate is referred to as rate modified playback.
If an audio signal is not processed appropriately, rate modified playback can affect the pitch of the audio signal and thereby make its content less intelligible. For example, if an audio signal is simply played back at a faster rate, the pitch of the audio signal can increase. Conversely, if an audio signal is simply played back at a slower rate, the pitch of the audio signal can decrease. With proper processing, however, it is possible to perform rate modified playback at a rate-invariant, or constant, pitch. During rate modified playback at a constant pitch, the pitch of the rate modified audio signal will be perceived by a listener to be substantially similar to the pitch of the audio signal played back at the first rate. Therefore, the intelligibility of the audio signal can be retained across a variety of playback rates.
One general method for performing rate modified playback of a digital audio signal containing speech is known as time-scale modification (TSM). In TSM, the spectral envelope and the pitch of an unmodified digital audio signal are measured at a plurality of discrete time points. A digital audio signal is then synthesized such that it has approximately the same spectral envelope and pitch at the corresponding time points when played back at the desired modified rate. In order to synthesize the rate modified digital audio signal, an initial estimate is first chosen and then iteratively refined to approach the required spectral envelope and pitch. Depending upon the quality of the initial estimate, fifty to one hundred iterations could be required to achieve an acceptable sound quality. As a result of the iterative processing used to develop the synthesized digital audio signal, the TSM approach is computationally intense.