Digital audio systems are well known in the prior art. Presently, two types of digital audio systems, the compact disc (CD) and the digital audio tape (DAT), are enjoying commercial success as mass production audio reproduction systems. While the benefits of digital recordings over conventional analog recordings are also well-known, digital audio systems have failed to attract critical listeners of professional or high-end audio systems. Such listeners are accustomed to enjoying immaculately precise and realistic music reproduction currently possible with professional or high-end analog systems. Because of deficient stereophonic imaging of current digital audio systems, digital audio technology has been almost universally rejected by the professional and high-end audio markets.
The goal of any digital audio system is to sample and reconstruct an analog audio signal without noticeable changes to the signal so as to recreate authentic sounding music. If, for example, the audio signal is sampled at a recording studio and the digital samples are stored on a CD, then the CD player must retrieve the digital samples and reconstruct the waveform of the audio signal as close as possible to the waveform of the original analog signal.
In theory, any analog signal can be reconstructed if an infinite number of digital samples are taken of the analog signal. In practice, the sampling rate of a digital audio system is governed by the Nyquist Theorem that any signal may be sampled and reconstructed provided the sampling rate is at least twice the highest frequency component of the original analog signal. An insufficiently high sampling rate tends to create an overlap in the reconstructed signal that gives rise to a special form of distortion known as aliasing. When the sampling rate is too low, the frequency domain images of the reconstructed signal overlap with the baseband and corrupt the higher frequency components of the baseband. Avoidance of aliasing is a primary goal of the sampling process of a digital audio system.
Because human hearing is usually considered to be bandlimited to 20 KHz, some prior art systems have proposed that a 20 KHz bandwidth is sufficient for high quality audio reproduction systems. The 20 KHz figure is based partly on tests where a subject is instructed to listen to a sinusoidal waveform that continuously increases in frequency and determine when the signal becomes inaudible. Most people will not be able to detect such a signal once it reaches 20 KHz. The audio bandwidth of current CD systems is 20 KHz and the guard band is 2 KHz. Therefore, the digital sampling rate, in accordance with the Nyquist Thereom, is 44.1 KHz. The audio bandwidth of current DAT systems is 20 KHz, and the guard band is 4 KHz, yielding a digital sampling rate of 48 KHz.
Although the human ear is incapable of detecting steady frequencies above 20 KHz, this does not mean that audio signals can be routinely bandlimited to this amount and still achieve high quality audio reproduction. In fact, studies have indicated that the human ear can perceive sonic effects of transient components of audio signals up to frequencies as high as 100 KHz. When an audio signal comprised of many transient pieces of high frequency sinusoids is passed through a digital audio system limited to a 20 KHz bandwidth, the transients will be spread out and will lose their transient nature, thereby degrading the quality of the audio reproduction.
Transients are necessary for professional and high-end audio reproduction because they are important to human hearing in the reconstruction of wavefronts that yield the three-dimensional ambience associated with stereophonic signals. To most listeners of professional or high-end audio systems, it is critical that the reproduced music possess this three-dimensional ambience where each individual sound source is perceived as being located on an imaginary sound stage. Indeed, the illusion of a stable three-dimensional sound image is the fundamental feature on which stereo sound is predicated.
Transients are also important in the resolution of the individual nuances of each of the sound sources. Natural music consists of characteristic noises and momentary silences between notes or overtone oscillations. It is important to prevent sonic blurring of these subtle nuances in the program material. Such details are easily destroyed by audio systems with poor transient response or excessive thermal noise and distortion, with the reproduced music sounding muddy and devoid of fine detail.
The presence of many transient pieces of high frequency sinusoids in audio signals requires a higher sampling rate for exact reproduction of those transient signals. For example, a 20 KHz sinusoid signal will be reproduced exactly by an audio system having a 20 KHz bandwidth only if the signal is turned on at a time of minus infinity and is never turned off. Once a signal is turned on and then turned off after a given number of cycles (i.e., a transient signal is created), a higher bandwidth is required in order to exactly reproduce that signal. In general, the required bandwidth to pass a finite number of cycles of a sinusoidal signal (F.sub.S) is: BW=F.sub.S *(1+1/# of cycles). For example, the required bandwidth to pass one cycle of a 15 KHz sinusoidal signal would be 30 KHz, a frequency much higher than the 20 KHz bandwidth limit of current digital audio system. Unfortunately, it is not practical to digitally sample audio signals to preserve frequencies up to 100 KHz because to do so would greatly increase the amount of digital information to be stored.
The problem of an insufficient sampling rate to reproduce high frequency transients in current digital audio systems is further compounded by the use of frequency domain brickwall filters to smooth the digital samples during the reconstructing of the analog audio signal. Early digital audio systems utilized an analog brickwall low pass filter in the digital-to-analog conversion to extract the baseband frequencies and reject the sampled harmonics above the bandwidth of the system. The analog brickwall filter fills in and smooths the signal between the points in the step function output created by averaging the samples together. In essence, the brickwall filter rounds off the edges of the signal output to create a smooth analog signal output.
Theoretically, a frequency domain method of digital audio signal reconstruction should work if the low pass brickwall filter could ideally pass all signals below its threshold or roll-off frequency at unitary gain and reject all signals above its roll-off frequency, and if the distance between the digital sample points is small enough that information is not lost during the sampling process. Unfortunately, an ideal low pass filter can not be realized. While it is possible to create a low pass brickwall filter that has excellent frequency domain specifications when driven by constant-energy-envelope sinusoids, when this brickwall or taut filter is driven by the transients and impulses of dynamic music material it generates overshoot, ripple and ringing. Because the sampling rates for CD and DAT systems are close to the minimum allowed Nyquist rate (40 KHz), most of the quantization noise generated by the sampling process will be concentrated in the base band audio range. In addition, image frequencies that extend from close to the base band to the top of frequency region are the frequencies that are most susceptible to audio amplifier nonlinearities (100 KHz -2 MHz),
The process in current digital audio systems is therefore non-optimal as designed in the frequency domain, both because of the inadequate sampling rate and because of the imperfect brickwall filter.
In an attempt to solve these problems, a method known as "oversampling" is used by some prior art digital audio system to increase the sampling rate to a rate typically four times the original sample rate (e.g., 176 KHz for CD's). The basic idea of the prior art oversampling techniques is to implement a digital low pass filter to carry out the function of the analog brickwall smoothing filters, with samples retrieved from the digital low pass filter at the higher oversampling rate. This is possible by adding zero magnitude (trivial) samples between each of the original samples to effectively increase the sampling rate of the system, although the trivial samples add no new information to the signal. For a more detailed explanation and critique of the prior art oversampling techniques, reference is made to Moses, R., "Improved Signal Processing for Compact Disc Audio System", Proceedings; Montech '87 IEEE Conference on Communications, Nov. 9-11, 1987, pp. 203-211, which is fully incorporated by reference herein.
The problem with current frequency domain oversampling techniques is that the digital filter, sometimes referred to as a Finite Impulse Response (FIR) filter must meet the same stringent ideal demands as the analog brickwall filter it replaces. Any deviation from an ideal low pass filter will cause corresponding alteration of the output signal. The design of the digital filters for current oversampling techniques is accomplished by normalizing the frequency parameters to the sampling rate. For example, if the sample rate is 44 KHz and the filter roll-off frequency is 20 KHz, the design frequency parameter will be 20/44 KHz=0.4545. In the case of an oversampling FIR filter, the final sampling rate must be used as the design parameter. If a four times oversampling FIR filter is desired, the design parameter will be 20/176 KHz=0.1136. The digital audio system must also include a transition band that spans the bandwidth of the transition region between 20 KHz-22 KHz, or a 2 KHz bandwidth. If a sixteen times oversampling FIR filter is desired for the transition band, the design parameter will be 2/704 KHz=0.0028. Such normalized frequency parameters are too small for the calculations required to derive the associated filter because the numbers do not contain enough significant digits. Without a sufficient number of significant digits in the calculation, these parameters introduce deviation from the desired response. As a result, the frequency domain design method for the digital FIR oversampling filters is unable to accommodate high oversampling rates.
Another limitation of current frequency domain oversampling techniques lies in the alteration of the filter coefficients. It is desirable to maintain a constant gain through the filter as the input signals are passed through it. By adding zero magnitude samples between the original samples, the amount of samples weighted by the filter at any instant of time are reduced in proportion to the number of trivial samples added. Because not all of the coefficients of the original samples are now used in the calculation of the output, the gain will vary as samples are shifted through the filter. This causes a corresponding deviation in the magnitude of the output signal that the listener may hear as a small degree of noise.
A further problem in the design of the FIR digital filters in the frequency domain is the arbitrary nature of choosing the appropriate frequency domain parameters. For example, with a given FIR filter order (typically 100 taps), parameters for each of the pass band, transition band, and stop band characteristics must be weighed in the specification of the filter. Without knowing reliable, acceptable figures for these parameters, the designer is effectively guessing at appropriate values for the filter.
Primarily because of the problems outlined above, current frequency domain oversampling technique are not capable of producing sufficiently high oversampling rates. Consequently, the image frequencies represented by the high frequency transients still fall in the nonlinear range of most amplifiers, and, as a result, these systems do not sufficiently overcome the discussed problems.
Although the present designs for processes of current digital audio systems are adequate for reproducing musical sound, it would be advantageous to have a method and system for interpolating digital audio signals that can reconstruct the high frequency and transient characteristics of the signals and enable the reproduction of high-quality musical sound in a professional or high-end digital audio system such that there will be no perceptible difference between the reconstructed signal and the original signal.