The amount of information available via computers has dramatically increased with the wide spread proliferation of computer networks, the Internet and digital storage means. With such increased amount of information has come the need to transmit information quickly and to store the information efficiently. Data compression is a technology that facilitates the effective transmitting and storing of information
Data compression reduces an amount of space necessary to represent information, and can be used for many information types. The demand for compression of digital information, including images, text, audio and video has been ever increasing. Typically, data compression is used with standard computer systems; however, other technologies make use of data compression, such as but not limited to digital and satellite television as well as cellular/digital phones.
As the demand for handling, transmitting and processing large amounts of information increases, the demand for compression of such data increases as well. Although storage device capacity has increased significantly, the demand for information has outpaced capacity advancements. For example, an uncompressed digital music source can require 5 megabytes of space whereas the same music can be compressed without loss and require only 2.5 megabytes of space. Thus, data compression facilitates transferring larger amounts of information. Even with the increase of transmission rates, such as broadband, DSL, cable modem Internet and the like, transmission limits are easily reached with uncompressed information. For example, transmission of an uncompressed music over a DSL line can take ten minutes. However, the same music can be transmitted in about one minute when compressed thus providing a ten-fold gain in data throughput.
In general, there are two types of compression, lossless and lossy. Lossless compression allows exact original data to be recovered after compression, while lossy compression allows for data recovered after compression to differ from the original data. A tradeoff exists between the two compression modes in that lossy compression provides for a better compression ratio than lossless compression because some degree of data integrity compromise is tolerated. Lossless compression may be used, for example, when compressing critical audio recording, because failure to reconstruct exactly the data can dramatically affect quality and analysis of the audio content. Lossy compression can be used with consumer music or non-critical audio recording where a certain amount of distortion or noise is tolerable to human senses.
Audio compression is an important technical problem. Most Web pages today host digital music, and digital music playing devices have become increasing popular these days.
Further, there are many existing schemes for encoding audio files. Several such schemes attempt to achieve higher compression rations by using known human psychoacoustic characteristics to mask the audio file. A psychoacoustic coder is an audio encoder which has been designed to take advantage of human auditory masking by dividing the audio spectrum of one or more audio channels into narrow frequency bands of different sizes optimized with respect to the frequency selectivity of human hearing. This makes it possible to sharply filter coding noise so that it is forced to stay very close in frequency to the frequency components of the audio signal being coded. By reducing the level of coding noise wherever there are no audio signals to mask it, the sound quality of the original signal can be subjectively preserved.
In fact, virtually all state-of-the-art audio coders, including the G.722.1 coder, the MPEG-1 Layer 3 coder, the MPEG-2 AAC coder, and the MPEG-4 T/F coder, recognize the importance of the psychoacoustic characteristics, and adopt auditory masking techniques in coding audio files. In particular, using human psychoacoustic hearing characteristics in audio file compression allows for fewer bits to be used to encode audio components that are less audible to the human ear. Conversely, more bits can then be used to encode any psychoacoustic components of the audio file to which the human ear is more sensitive. Such psychoacoustic coding makes it possible to greatly improve the quality of an encoded audio at given bit rate.
Psychoacoustic characteristics are typically incorporated into an audio coding scheme in the following way. First, the encoder explicitly computes auditory masking thresholds of a group of audio coefficients, usually a “critical band,” to generate an “audio mask.” These thresholds are then transmitted to the decoder in certain forms, such as, for example, the quantization step size of the coefficients. Next, the encoder quantizes the audio coefficients according to the auditory mask. For auditory sensitive coefficients, i.e., those to which the human ear is more sensitive, a smaller quantization step size is typically used. For auditory insensitive coefficients, i.e., those to which the human ear is less sensitive, a larger quantization step size is typically used. The quantized audio coefficients are then typically entropy encoded, either through a Huffman coder such as the MPEG-4 AAC quantization & coding, a vector quantizer such as the MPEG-4 TwinVQ, or a scalable bitplane coder such as the MPEG-4 BSAC coder.
In each of the aforementioned conventional audio coding schemes, the auditory masking is applied before the process of entropy coding. Consequently, the masking threshold is transmitted to the decoder as overhead information. As a result, the quality of the encoded audio at a given bit rate is reduced to the extent of the bits required to encode the auditory masking threshold information.
High performance audio codec brings digital music into reality. Popular audio compression technologies, such as MP3, MPEG-4 audio, Real™ and Windows Media Audio (WMA™), are usually lossy in nature. The audio waveform is distorted in exchange for higher compression ratio. In quality critical applications such as a recording/editing studio, it is imperative to maintain the best sound quality possible, i.e., the audio should be compressed in a lossless fashion. Since lossless compression ratio is usually limited, it is desirable that the lossless compressed bitstream be scaled to a lossy bitstream of high compression ratio. Most lossless audio coding approaches, simply build upon a lossy audio coder, and further encode the residue. The compression ratio of such approaches is often affected by the underlying lossy coder. Since the quantization noise in the lossy coder is difficult to model, the approaches usually lead to inefficiency in the lossless audio coding. Moreover, it is also more complex, as it requires a base coder and a residue coder. Some other approaches build the lossless audio coder directly through a predictive filter and then encode the prediction residue. The approaches may achieve good compression ratio, however, it is not compatible with existing lossy audio coding framework. Since the compression ratio of a lossless coder is rather limited, usually 2-3:1, the ability to scale a lossless bit stream is very useful. The bit stream generated by the predictive filter based lossless coder cannot be scaled. A lossy/residue coder can generate a bit stream with two layers, a lossy base layer and a lossless enhancement layer. However, the scaling cannot go beyond the lossy base layer. If further scaling in the lossless enhancement layer is required, it is necessary to match the design of the residue coder with that of the lossy coder, which causes significant complications. Some other approaches build the lossless audio coder directly through a predictive filter and then encode the prediction residue. Though achieving a good compression ratio, such approach is not compatible with existing lossy audio coding framework. Moreover, the resultant bitstream cannot be scaled.