1. Field of Invention
The invention relates to audio signal transmission, and more particularly to varying the sample-rate to improve coding gain for audio signals.
2. Description of Related Art
There are a number of decisions which must be made in setting up an audio compression system. Among the most important variables that affect audio quality during encoding are the sampling rate, bit rate, and the frequencies that will be encoded, such as 20 Hz-20 KHz or some lesser range, for example. For a given level of distortion and a given algorithm, more bits are required to transmit more signal frequencies. Therefore, there is a optimal match between bit rate and frequency range such that if the bit rate is specified, distortion will increase if more frequencies are encoded then is optimal for that bit rate.
Most high-quality audio algorithms, such as MPEG AAC (MPEG Advanced Audio Coder), PAC (Perceptual Audio Coder), MPEG layer3, Dolby AC3 (Advanced Coder 3), and NTT""s TwinVQ, encode a fixed number of samples into each frame which then represent a unit of time for a particular algorithm. Each audio frame carries side information. The number of bits needed to encode the side information per frame is roughly constant. This side information imposes a per-frame overhead.
The frame frequency (i.e., the number of frames per second) used by an audio algorithm is proportional to the sampling rate because each frame encodes a constant number of samples.
Decreasing the sampling rate decreases the number of frames-per-second, which in turn decreases the number of bits diverted for overhead, allowing more bits to be used for audio coding. Thus, lowering the sampling rate results in more bits being available for audio coding which results in a higher quality signal as long as sufficient frequency range is preserved.
To a similar end, the statistical properties of music indicate that an optimal frame duration is about 40 ms. For AAC and PAC at sampling rates of 44100 sps (samples per second) (i.e., the CD sample rate) the frame duration is about 23 ms; at 22050 sps, the frame duration is 46 ms.
The lower the sampling rate, the lower the frequency range that can be transmitted, as described by the Nyquist rule, which limits the maximum frequency range to half of the sampling rate. In practical implementations a xe2x80x9cguard bandxe2x80x9d is needed which further lowers the achievable maximum frequency range. For example, for any algorithm (e.g. AAC), at a sampling rate of 22050 sps, the maximum frequency range is 8 to 10 KHz.
Thus, for a given algorithm, and for a given bit rate b0 that is not sufficient for encoding the entire human-audible frequency range in a transparent manner without audible distortion, and for a specified acceptable level of distortion, there is a maximum frequency range f0 that one can encode, and that maximum will be associated with a sample rate fs0.
If there were no outside constraints, then one would use fs0 as the sampling rate. However, several outside constraints exist. For example, PCs and Macintoshes work mostly at 44100, 22050 and 11025 sps. Some PCs work at one or more of the rates 48000, 32000, 24000, 16000 and 8000 sps, but very few PCs will work at all of these sample rates. In fact, Macintosh audio hardware will not work at all at these latter sample rates, so a user is constrained to a small set of sample rates if he or she want to interact with PCs and an even smaller set of sample rates if one wants to interact transparently with Macs without involving potentially inferior resampling in the PC or Mac.
The invention relates to a method and apparatus for achieving maximal coding gain for audio coding and reproduction. More particularly, at a chosen sample rate and frequency range value, an audio input signal is transduced, sampled, downsampled to the encoding sample rate, encoded and transmitted at a given bit rate. At the receiving end, the downsampled signal is decoded and upsampled to the original or other suitable sample rate. The upsampled signal is then audibly output.
Resampling using xe2x80x9csmall-integerxe2x80x9d ratios (e.g. 11:8) is computationally more efficient than using arbitrary resampling ratios. This method and apparatus support both arbitrary and small-integer ratio resampling. The use of small-integer resampling frequently implies the use of non-standard sampling rates in the transmitted channel, for example 32073 sps rather than 32000 sps.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of the preferred embodiments.