This invention is in the field of digital audio systems, and is more specifically directed to sample rate conversion in such digital audio systems.
Digital signal processing techniques are widely implemented in a wide range of modern-day electronic systems. Tremendous increases in the switching speed of digital circuits have enabled digital signal processing to replace, in large part, analog circuits in many applications. Modern digital signal processing are sufficiently capable that digital techniques have become widely implemented in audio electronic applications, indeed to such an extent that audio-visual receivers can now be realized nearly entirely in the digital domain.
Modern digital audio-visual receivers and digital audio receivers are typically required to be capable of receiving audio input from a wide variety of sources. For example, a typical digital audio amplifier can receive, process, and amplify audio signals from an AM/FM radio tuner (which may be built into the receiver), analog line-in inputs receiving analog audio from an external source, optical or other digital line-in inputs receiving audio signal from a satellite or cable television source, digital audio signals from a CD player, uncompressed and encrypted digital streams from a set-top box or computer over a High Definition Multimedia Interface (HDMI), and still other sources. Because these digital receivers process the audio signals in the digital domain, the incoming audio signals must be either received in digital form, or converted from analog signals to digital signals. In either case, the resulting digital representation of the audio signals to be processed is in the form of a datastream of digital words (typically sixteen, twenty, or twenty-four bits in width), each digital word having a value corresponding to the amplitude of a sample of the audio signal at a sample point in time. To satisfy the Nyquist theorem, the sample rate of the digital audio sample stream must be at least twice the highest frequency of interest (the Nyquist frequency) of the represented audio signal.
As is well known in the art, the sample rates of signals presented by these various digital audio signal sources vary widely. For example, CD players typically generate 44.1 kHz datastreams and MP3 players use multiple sample rates ranging from 22.05 kHz up to 48 kHz (and can jump from one sample rate to another on a track-by-track basis), while DVD players typically generate digital audio at a 48 kHz sample rate. Digital audio communicated according to the HDMI standard is in the form of up to eight channels of 24-bit audio at sample rates of up to 192 kHz. In general, modern digital audio systems and digital audio-visual receivers are expected to be capable of receiving input digital audio signals at sampling rates ranging from as low as 8 kHz to as high as about 200 kHz.
The power output stages of modern digital audio receivers are implemented as “class D” digital amplifiers, which respond to pulse-width modulated binary drive signals. These pulse-width modulated signals are typically at very high frequencies, on the order of hundreds of kHz, to provide a wide dynamic range in the audio output driving the loudspeakers. Modern digital audio receivers are thus faced with the task of processing input audio signals, from any of these various signal sources, into pulse-width modulated output signals. The tasks involved in this processing are typically quite substantial, including digital filtering and other digital functions such as equalization, phase compensation, and the like, as well as the pulse-width modulation of the output signals.
Conventional digital audio receivers dealt with the varying input sample rates by processing the digital audio signal at its received sample rate, whatever it may be depending on the selected input signal source. However, the digital signal processing techniques now used in many audio receivers has become more capable and thus more complex over time. The computational complexity required to perform this highly complex digital signal processing, at any one of a number of varying sample rates, has therefore exploded. This trend in computational complexity has become compounded by the higher sample rates such as those for HDMI audio signals, especially if the complex digital signal processing is to also be carried out at those high sample rates. As such, the limiting factor in modern high performance digital audio systems is the processor cycle count required for this complex processing, considering that the sample rates can be as high as 192 kHz.
Therefore, rather than perform this complex digital signal processing at the sample rate of the input signal, which varies from source to source, modern digital audio receiver systems now perform sample rate conversion to convert a discrete sampled input signal at one sample rate to a sampled signal at another sample rate. Through the use of sample rate conversion, the input signals from various audio sources, at varying sample rates, can all be converted to a common sample rate at which the digital signal processing is performed. This greatly reduces the computational complexity and hardware cost of the digital audio processor functions in the receiver.
For the special cases in which the ratio of the input and output sample rates to and from the sample rate conversion can be expressed as a ratio of relatively small integers, sample rate conversion can be carried out by a relatively simple circuit or digital signal processor (DSP) software routine. Sample rate converters of this type apply a sequence of upsampling and interpolation of the input signal, creating an intermediate signal at a sample rate that is a common multiple of the input and output sample rates, followed by downsampling of the interpolated intermediate signal at the desired output sample rate. To prevent aliasing or images in the converted output signal, the interpolation is implemented by way of a digital filter that band-limits the result of the upsampling, or band-limits the signal that is to be downsampled. In general, the cutoff frequency of the interpolation filter depends on the lower of the input sample rate and the output sample rate; more precisely, this cutoff frequency is preferably at the lower of the input and output “folding” frequencies (i.e., ½ of the lower one of the input and output sample rates; the folding frequency is also often referred to, confusingly, as the “Nyquist frequency” in the digital domain).
However, the input frequency does not always have such a convenient relationship to the digital signal processing or output frequency. This is increasingly the case for modern digital audio receivers that receive input digital audio signals over a wide range of sample rates, as discussed above. In addition, the input sample rate can vary over time, either by way of drift or because of a change in the source of the input sample stream; in these cases, the sample rate conversion ratio also varies with time. To address these factors, sample rate conversion functions referred to as “asynchronous” sample rate converters are known in the art, and are capable of converting the sample rate of the signal over a wide range of ratios, and of tracking changes in the conversion ratio over time. Conceptually, asynchronous sample rate conversion can be considered as filtering the input signal with a continuous-time low-pass filter to form, in effect, an analog representation of the input sample stream. This conceptual analog signal can then be resampled at the desired output sample rate. Again, the cutoff frequency of the low-pass filter of an asynchronous sample rate converter depends on the lower of the folding frequencies of the input and output sample rates.
In carrying out this function, modern conventional asynchronous sample rate converters reject jitter in the input data stream. By constructing the asynchronous sample rate converter so that it changes its conversion ratio very slowly (e.g., by implementing an extremely low cut off low-pass filter in the ratio-tracker algorithm), higher-frequency jitter in the input sample stream will effectively be rejected. This eliminates the need to obtain jitter reduction through a complex phase-locked loop with low bandwidth (which would necessitate large external components and other complexities). Asynchronous sample rate converters of this type are thus quite beneficial in the receipt and processing of high-jitter sample data, such as are often involved in HDMI signals, which are based on video clocks, and in packet-based audio streaming over USB, IEEE1394, Ethernet etc., in which the packet data is buffered in a FIFO. The ratio tracking provided by asynchronous sample rate conversion can also perform the difficult task of reconstructing an audio clock from the single-signal data stream (clock and data on the same physical input line) in SPDIF (SONY-Philips Digital Interface Format) audio receivers.
FIG. 1a illustrates the concept of asynchronous sample rate conversion. In this example, input discrete-time sample stream x[n] has a sample rate fin, which is at least at twice the Nyquist frequency of the information conveyed by that sample stream. This sample stream x[n] corresponds to a band-width limited continuous-time signal x(t) that is sampled at a sample rate fin. Conceptual asynchronous sample rate converter (ASRC) 2 includes continuous-time filter 3, which applies an ideal transfer function H(f) to sample stream x[n], producing a continuous-time intermediate signal based on input sample stream x[n]. Resampler 4 samples the continuous-time signal output from filter 3 at the desired output sample rate fout, to produce the output discrete-time signal y[m] at sample rate fout. In effect, this conceptual ASRC operates as a digital-to-analog converter, running at the input rate and applying an ideal low-pass filter H(f), that feeds an analog-to-digital converter that is sampling at the output rate.
Ideally, the transfer function H(f) of filter 3 will interpolate between the samples x[n], passing information within the sample stream up to a maximum frequency of interest fa, while suppressing aliasing within critical bands at the output sample frequency fout and its multiples. In this ideal case, the amplitude characteristic |H(f)| will pass frequency components from DC to the maximum frequency of interest fa, and will suppress frequency components in aliasing bands of width 2fa centered at output sample frequency fout and its multiples. Such a transfer function H(f) of filter 3 corresponds to a low-pass filter with a cut-off frequency fco lower than one-half of the input sample rate fin in order to avoid aliasing in the sample. In addition, this cut-off frequency fco must be lower than one-half of the output sample rate fout to avoid image components from leaking into the output signal. Accordingly, transfer function H(f) must have a cut-off frequency fco below the lesser of fin/2 and fout/2, specifically the lower of the two Nyquist frequencies for the input and output signals. Assuming proper implementation of its transfer function H(f), in the case in which the desired output sample rate fout is higher than the input sample rate fin (i.e., ASRC 2 is upsampling), filter 3 enables a perfect reconstruction of the original continuous-time signal x(t), having a maximum frequency of interest fa, at the output of filter 3. If the input sample rate fin is higher than the desired output sample rate fout (downsampling), filter 3 will remove high-frequency information in the input sample stream x[n] that would otherwise be reflected as aliasing in the eventual discrete-time signal y[m].
The conceptual approach of FIG. 1a is conventionally implemented by constructing a digital filter that performs the function of continuous-time filter 3 and of resampler 4. In practice, modern asynchronous sample rate converters following the concept of FIG. 1a typically do not reconstruct the entire continuous-time intermediate signal, but instead calculate only those sample values sufficient to construct the discrete-time signal y[m]. As known in the art, this is accomplished with reference to an impulse response h(t) that has the desired corresponding transfer function H(f). The digital filter realizing this impulse response in combination with the resampling function can be realized by evaluating the filter coefficients h at times depending on the output sample times m in the output sample sequence y[m]:
      y    ⁡          [      m      ]        =            ∑      n        ⁢                  x        ⁡                  [          n          ]                    ⁢              h        ⁡                  [                                    mT              out                        -                          nT              in                                ]                    where the periods Tin and Tout correspond to the input and output sample periods, respectively. From the above equation, this filter has variable coefficients bn,m that depend on the particular output sample m:bn,m=h[mTout−nTin]Given the above, the combination of filter 3 and resampler 4 can be readily realized with a conventional digital signal processor or other logic, although because the filter coefficients vary from output sample to output sample, these coefficients must either be calculated on-the-fly, be retrieved from a look-up table, or a combination of the two. Also, in practice, filter 3 is realized as a finite impulse response (FIR) filter, with a unit impulse response h[m] that is of finite length.
While this conceptual approach is effective in performing asynchronous sample rate conversion, its direct implementation is not practical, because the filter transfer function H(f) must necessarily be very sharp. This sharp characteristic, in frequency, of course results in a very long impulse response h(t) that, in the digital realization, requires a large number of taps to be calculated for each output sample.
FIG. 1b illustrates the construction of a popular conventional asynchronous sample rate converter that addresses the problem by way of a fully digital interpolator based on a very simple continuous time filter, namely a zero-order hold network. In this construction, filter 3 of the conceptual ASRC 2 of FIG. 1a is realized by an interpolator 3′, which generates an intermediate continuous-time signal from the input signal x[n]. Resampler 4 then re-samples this intermediate signal at the desired output sample rate fout. In this approach, interpolator 3′ is realized by the functional sequence of upsampler 5, discrete-time low-pass filter 6, and zero-order hold network (or filter) 7. Upsampler 5 upsamples the input sample stream x[n] by inserting L−1 zeroes (“zero-padding”) between each input sample, producing a discrete time signal at a sample rate of L times the input sample rate fin. This zero-padded signal is filtered by low-pass filter 6, typically a digital filter operating at a sample rate corresponding to the upsampled sample rate Lfin and having a cutoff frequency at the lower of the input sample rate fin and the output sample rate fout. Zero-order hold function 7 effectively converts the interpolated signal from filter 5 into a “stair-case” signal, which is then sampled at the desired output sample rate fout by resampler 4, to produce output discrete-time signal y[m]. In practice, the arrangement of FIG. 1b is functionally equivalent to selecting the most recent sample value output by filter 6 as the current output sample y[n] in the sequence.
Typically, the upsampling factor L of conventional asynchronous sample rate converters is an extremely high number, for example on the order of one million, resulting in an effective sample rate of on the order of 50 GHz or more. Of course, such an extremely high sample rate is impractical to realize. Accordingly, conventional asynchronous sample rate conversion functions in fact do not actually upsample, but instead perform the calculations necessary to “conceptually” upsample and interpolate the incoming signal stream, and select the relevant samples, which are the samples nearest in time to the desired sample points at the output sample rate. While the number of taps in filter 6 increases proportionally with upsampling factor L, only a relatively small subset of these taps are active for any given output sample, because of the abundance of zero-valued input samples in the upsampled zero-padded input stream. Filters applying such subsets of taps are typically referred to in the art as “polyphase” filters, in which the number of active taps is roughly independent of factor L (the number of subset filters increases with increasing L, but the length remains approximately constant). In practice, these filters operate by performing the tap calculation at the lower input sample rate fin, and applying the relevant polyphase subfilter for the desired output sample.
While physically realizable, however, the number of potential taps of digital filter 6 necessary for modern asynchronous sample rate conversion requires a huge memory (typically read-only memory) to store the filter coefficients, substantial computational capacity to calculate these coefficients “on the fly”, or some combination of the two. This complexity is, of course, expensive to implement. Furthermore, the time-domain error, or “jitter” resulting from the selection of the closest sample, which in effect is “rounding off” in time, can adversely affect the system performance. For example, a jitter of about 20 psec is inherent in asynchronous sample rate conversion at an effective sample rate of 50 GHz. For the example of a full scale 96 kHz audio input signal (such as is possible in a modern high-performance system capable of sampling at 192 kHz), this 20 psec error in sampling time is only about 98 dB down from full scale. Unfortunately, 24-bit resolution requires sampling time error (i.e., jitter) to be at a level of no more than −144 dB from full scale; this requires the jitter to be less than about 0.1 psec for a 96 kHz full scale signal. To achieve this jitter level, the effective sample rate must be on the order of 10 THz, which necessitates a prohibitively high upsampling factor L.
Upsampling factor L can be lowered by replacing the zero-order hold network 7 with a more advanced, higher-order, continuous-time filter. As fundamental in the art, hold network 7 of interpolator 3′ in FIG. 1b produces a staircase type output signal. In the time-domain, the impulse response of zero-order hold network 7 is a rectangular window function that spans 1/(L·fin) in time. In the frequency domain, the transfer function of hold network 7 is a sinc function with an infinite number of transmission zeros, each located at all multiples of L·fin. These zeros (i.e., notches) are centered directly at the images (spectral repetitions) of the output of filter 6. FIG. 1c illustrates an example of an ASRC in which a higher-order hold filter 7′ is used in place of a zero-order hold network in interpolator 3″. For example, the next-most advanced filter 7′ is a linear interpolator, which will connect the upsampled sample values output from filter 6 with straight lines in the time domain. As known in the art, a linear interpolator can be achieved by filtering the output of a first sinc filter with another sinc filter; this gives a triangular impulse response that spans 2/(L·fin). The linear interpolator thus gives far more attenuation of the images of the output of filter 5 about its notches in the frequency domain; as a result, the upsampling factor L can be reduced from that used with the zero-order hold filter, to achieve the same performance. This approach can be further extended by realizing filter 7′ by even higher order filters, providing more and more bell-shaped impulse responses and deeper and deeper notches with increasing order, such deeper notches enabling further reduction in the upsampling factor L, for a given level of performance. The class of nth order hold filters is referred to as B-spline interpolators. The impulse response of an nth order B-spline interpolator is a piece-wise polynomial, with the nth order impulse response consisting of n polynomial sections, each of order n−1. Evaluation of an output sample by way of a B-spline interpolator is obtained by way of an FIR filter operation with n+1 taps, with each filter coefficient calculated by evaluation of a polynomial. Many other interpolation functions other than B-splines are known, including the well known Lagrange interpolation function, and the polynomial interpolation filter described in commonly assigned U.S. application Ser. No. 12/210,794, filed Sep. 15, 2008. Fractional delay filters are also known in the art, as a form of digital filter that is equivalent to some continuous time filter, and that interpolates between samples using a tunable delay.
The conventional asynchronous sample rate converter approaches shown in FIGS. 1b and 1c perform reasonably well for those cases in which the ratio R of input sample rate fin to output sample rate fout is below unity (i.e., in the upsampling case). In the upsampling case, the cutoff frequency of filter 6 follows the input sample rate fin, as it is a lower frequency than the output sample rate fout. However, it has been observed, according to this invention, that this conventional arrangement requires substantial computational complexity if the ratio R exceeds unity (i.e. in the downsampling case), primarily because the cutoff frequency of filter 6, relative to its sample rate of L·fin, must follow the output sample rate fout, which requires its cutoff frequency to change with changes in the output sample rate. In contrast, the filter cutoff frequency remains constant, relative to the filter sample rate, in the up-sampling case. In order for the filter cutoff frequency to follow the output sample rate in the downsampling case, a more complex (i.e., more poles and zeroes) digital filter is required, with a corresponding increase in the number of taps (if realized as an FIR filter), in order to realize the smaller relative transition band necessary for changes in cutoff frequency. Not only does the number of taps in the FIR realization increase, but because the sampling rate of this filter increases proportionally with the ratio R, the computational complexity per output sample of the filter itself increases with the square of the increase in the ratio R. In addition, this computational complexity is further increased by the necessity to recalculate filter coefficients with changes in the output sample rate, or to store those coefficient values in memory in advance.
Another known approach to asynchronous sample rate conversion includes an interpolation filter, a resampler, and a decimator in series. In this approach, the interpolation filter receives, upsamples, and low-pass-filters the input signal, producing a result that is buffered in a FIFO at the upsampled sample rate. This intermediate continuous-time signal from the FIFO of the upsampler is resampled by way of an interpolation filter that has stopbands at multiples of a resampling frequency L·fout. The resampled signal is decimated by a factor L to produce the sample rate converted signal at the desired output sample rate. The decimator is realized by a steep digital low-pass filter with a cut-off frequency at the lower of the Nyquist rates of the input and output signals, to remove aliasing and images, and which outputs every Lth sample of the filtered signal. The frequency response of the digital decimation filter is periodic with the resampling rate L·fout, with stop bands at multiples of L·fin. While this approach enables the interpolation filters to be relatively simple linear interpolation filters, a very high decimation factor is required at the output of the sample rate converter. In addition, the decimation filter of this approach becomes extremely complex if the input sample rate is low relative to the output sample rate (i.e., in the case of strong upsampling). This complexity is because the stopbands of the decimation filter must block those images of the input signal that are at a lower rate than the eventual output sample rate (L·fin<L·fout). Reduction of the cut-off frequency of this filter, relative to the filter sampling rate, in the downsampling case is reflected in a smaller relative transition band and thus a more complex filter (i.e., more taps, if realized as an FIR filter). Moreover, the filter coefficients must be recalculated each time that the ratio R changes.
By way of further background, the use of fractional delay filters in sample rate conversion from an input sample rate to a higher output sample rate (i.e., upsampling) is described in Rajamani, et al., “An Efficient Algorithm for Sample Rate Conversion from CD to DAT”, Signal Processing Letters, Vol. 7, No. 10 (IEEE, October 2000), pp. 288-290.