The present invention relates to audio encoder systems, and in particular to an enhanced psycho-acoustic modeler for efficient perceptive encoding compression of digital audio data.
Digital audio is now in widespread use in audio and audiovisual systems. Digital audio is used in compact disc (CD) players, digital video disk (DVD) players, digital video broadcast (DVB), and many other current and planned systems. A problem of all these systems is the limitation of either storage capacity or bandwidth, which may be viewed as two aspects of a common problem. In order to fit more digital audio in a storage device of limited storage capacity, or to transmit digital audio over a channel of limited bandwidth, some form of digital audio compression is required.
Because of the structure of digital audio data, many of the traditional data compression schemes have been shown to yield poor results. One data compression method that does work well with digital audio is perceptive encoding. Perceptive encoding uses experimentally determined information about human hearing from what is called psycho-acoustic theory. The human ear does not perceive sound frequencies evenly. It has been determined that there are 25 non-linearly spaced frequency bands, called critical bands, to which the ear responds. Furthermore, it has been shown experimentally that the human ear cannot perceive tones whose amplitude is below a frequency-dependent threshold, or tones which are near in frequency to another, stronger tone. Perceptive encoding exploits these effects by first converting digital audio from the time-sampled domain to the frequency-sampled domain, and then by not allocating data to those sounds which would not be perceived by the human ear. In this manner, digital audio may be compressed without the listener being aware of the compression. The system component which determines which sounds in the incoming digital audio stream may be safely ignored is called a psycho-acoustic modeler.
A common example of perceptive encoding of digital audio data is that given by the Motion Picture Experts Group (MPEG) in their audio and video specifications. A standard decoder design for digital audio is given in the MPEG specifications, which allows all MPEG encoded digital audio data to be reproduced by differing vendors"" equipment. Certain parts of the encoder design must also be standard in order that the encoded digital audio may be reproduced with the standard decoder design. However, the psycho-acoustic modeler may be changed without affecting the ability of the resulting encoded digital audio to be reproduced with the standard decoder design.
Early consumer products using MPEG standards, such as DVD players, were play-back only devices. The encoding was left to professional studio mastering facilities, where shortcomings in the psycho-acoustic modeler could be overcome by making numerous attempts at encoding and adjusting the equipment until the resulting encoded digital audio was satisfactory. Moreover, the cost of encoding equipment to a recording studio was not a substantial issue. These factors will no longer be true when newer consumer products, such as recordable DVD players and DVD camcorders, become available. The consumer will want to make a satisfactory recording with a single attempt, and the cost of the encoding equipment will be a substantial issue. Therefore there exists a need for a refined psycho-acoustic modeler for use in consumer digital audio products.
The present invention includes a system and method by which the criteria used by a data compression apparatus can be further refined. A threshold is established which depends on the bit rate of the input data. A determination is made whether the bit rate is above or below the established threshold. A masking index is calculated for the input data according to a first formula-if the input data is being transmitted at a rate at or below the threshold. A second formula is used to calculate the masking index if the input data is being transmitted at a rate above the threshold. The masking index is used to generate a masking threshold, and data deemed insignificant relative to the masking threshold is ignored.
In the preferred embodiment of the present invention, a psycho-acoustic modeler, which is included in the encoding section of an encoding/decoding (CODEC) circuit, is used to determine a masking index. The masking index is then used to generate a masking threshold. A masking threshold is an information curve generated for and unique to each piece of audio data which enters the CODEC circuit. The psycho-acoustic modeler uses experimentally determined information about human hearing and, through a process called perceptive encoding, determines which parts of the input audio data will not be perceived by the human ear. The masking threshold is a curve below which the human ear cannot perceive sounds. The psycho-acoustic modeler compares the masking threshold uniquely generated for the specific piece of input audio data and compares the masking threshold to the input audio data. This comparison dictates to the encoding section of the CODEC circuit which of the tones and noises contained within the input audio data can be ignored without sacrificing sound quality.
The preferred embodiment of the present invention includes a refined method and system by which the masking thresholds for each piece of audio data are determined. The psycho-acoustic modeler must be able to differentiate between data traveling at or below 192 kbit/sec and data traveling above 192 kbits/sec. In the preferred embodiment of the present invention, the psycho-acoustic modeler uses one set of coefficients when the audio data is traveling at a bit-rate above 192 kbits/sec. When the audio data is traveling at a bit-rate at or below 192 kbits/sec a second set of coefficients are used. The use of different coefficients depending on the bit rate of the input data varies the psycho-acoustic modeler to more accurately predict the data that may be safely ignored without affecting the perceived quality of the audio provided.
In another embodiment the invention provides a method for refining encoding criteria for input data in a data compression apparatus. The method comprises establishing a threshold for the bit rate of the input data; determining whether the input data is being transmitted at a bit rate above or below the established threshold; calculating a mask index for the input data according to a first formula if the input data is being transmitted at a rate at or below the threshold and according to a second formula if the input data is being transmitted at a rate above the threshold; using the mask index to generate a masking threshold; and ignoring data which is deemed insignificant relative to the masking threshold.
The novel features which are characteristic of the invention, as to organization and method of operation, together with further objects and advantages thereof will be better understood from the following description considered in connection with the accompanying drawings in which a preferred embodiment of the invention is illustrated by way of example. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention.