1. Field of the Invention
This invention relates generally to improvements in digital audio processing and specifically to a system and method for efficiently implementing a masking function in a psycho-acoustic modeler in digital audio encoding.
2. Description of the Background Art
Digital audio is now in widespread use in audio and audiovisual systems. Digital audio is used in compact disk (CD) players, digital video disk (DVD) players, digital video broadcast (DVB), and many other current and planned systems. The ability of all these systems to present large amounts of audio is limited by either storage capacity or bandwidth, which may be viewed as two aspects of a common problem. In order to fit more digital audio in a storage device of limited storage capacity, or to transmit digital audio over a channel of limited bandwidth, some form of digital audio compression is required.
Due to the structure of audio signals and the human ear""s sensitivity to sound, many of the usual data compression schemes have been shown to yield poor results when applied to digital audio. An exception to this is perceptive encoding, which uses experimentally determined information about human hearing from what is called psycho-acoustic theory. The human ear does not perceive sound frequencies evenly. Research has determined that there are 25 non-linearly spaced frequency bands, called critical bands, to which the ear responds. Furthermore, this research shows experimentally that the human ear cannot perceive tones whose amplitude is below a frequency-dependent threshold, or tones that are near in frequency to another, stronger tone. Perceptive encoding exploits these effects by first converting digital audio from the time-sampled domain to the frequency-sampled domain, and then by choosing not to allocate data to those sounds which would not be perceived by the human ear. In this manner, digital audio may be compressed without the listener being aware of the compression. The system component that determines which sounds in the incoming digital audio stream may be safely ignored is called a psycho-acoustic modeler.
Two examples of applications of perceptive encoding of digital audio are those given by the Motion Picture Experts Group (MPEG) in their audio and video specifications, and by Dolby Labs in their Audio Compression 3 (AC-3) specification. The MPEG specification will be examined in detail, although much of the discussion could also apply to AC-3. A standard decoder design for digital audio is given in the MPEG specifications, which allows all MPEG encoded digital audio to be reproduced by differing vendors"" equipment. Certain parts of the encoder design must also be standard in order that the encoded digital audio may be reproduced with the standard decoder design. However, the psycho-acoustic modeler, and its method of calculating individual masking functions, may be changed without affecting the ability of the resulting encoded digital audio to be reproduced with the standard decoder design.
In some implementations, the psycho-acoustic modeler calculates the individual masking functions by adding together psycho-acoustic model components expressed in decibels (dB). These psycho-acoustic model components, expressed in dB, are logarithmic components, and therefore the logarithms of any newly measured quantities must be derived. Derivation of the logarithms of measured quantities may be performed by using a look-up table, or, alternatively, by direct calculation. Neither of these methods possess utility when used with the preferred data processing equipment: a digital signal processor (DSP) microprocessor executing code written in assembly language. The size of the look-up table would be excessive when used with the broad range of signal values anticipated. Similarly, the calculation of transcendental functions such as logarithms is inconvenient to code in assembly language. Therefore, there exists a need for an efficient implementation of a masking function in a psycho-acoustic modeler for use in consumer digital audio products.
The present invention includes a system and method for a refined psycho-acoustic modeler in digital audio perceptive encoding. Perceptive encoding uses experimentally derived knowledge of human hearing to compress audio by deleting data corresponding to sounds which will not be perceived by the human ear. A psycho-acoustic modeler produces masking information that is used in the perceptive encoding system to specify which amplitudes and frequencies may be safely ignored without compromising sound fidelity. In the preferred embodiment, the present invention comprises a system and method for efficiently implementing a masking function in a psycho-acoustic modeler in digital audio encoding.
The present invention includes a refined approximation to the experimentally-derived individual masking spread function, which allows superior performance when used to calculate the overall amplitudes and frequencies which may be ignored during compression. The present invention may be used whether the maskers are tones or noise. In the preferred embodiment of the present invention, the parameters of the individual masking functions are expressed and stored in linear representations, rather than expressed in decibels and stored in logarithmic representations. In order to more efficiently calculate the individual masking functions, some of these parameters are stored in look-up tables. This eliminates the necessity of extracting the logarithms of masker amplitudes and thus enhances performance when programming in assembly language for a digital signal processor (DSP) microprocessor.
In the preferred embodiment, the initial offsets from the signal strength, called mask index functions, are directly stored in look-up tables. The dependencies of the individual masking functions at frequencies away from the masker central frequency, called spread functions, are calculated from components stored in look-up tables.