1. Field of the Invention
The present invention relates to an audio data processing (compression & decompression) system, method, and implementation in order to provide a high-speed, high-compression, high-quality, multiple-resolution, versatile, and controllable audio signal communication system. Specifically, the present invention is directed to a wavelet transform (WT) system for digital data compression in audio signal processing. Due to a number of considerations and requirements of the audio communication device and system, the present invention is directed to provide highly efficient audio compression schemes, such as a segment-based channel splitting scheme or a non-segment-based no-latency scheme, for local area multiple-point to multiple-point audio communication.
2. Description of the Related Art
Musical compact discs become popular and widespread since 1990s. Compact discs digitally store music by a sample frequency of 44.1K, i.e., taking 16-bit samples 44.1 thousand times each channel for stereo per second. Unfortunately, such a scheme involves a large amount of data—about 10 MB per minute of audio, which makes it difficult and inefficient to distribute music over the internet. Audio compression thus becomes necessary to reduce the amount of audio data with an acceptable quality. Lossless compression (reducing information redundancy) is used by audio professionals for further processing (later work on samples for example). People who trade live recordings often use lossless formats. While lossless compression, recovering all original audio signals, guarantees music quality, the amount of data involved remains large—typically 70% of the original format.
On the other hand, lossy compression is not a flawless compression (i.e. redundancy reduction is not reversible), but an irrelevance coding (i.e. an irrelevance reduction). Lossy compression removes irrelevant information from the input in order to save space and bandwidth cost so as to store/transfer much smaller music files. In other words, sounds considered perceptually irrelevant are coded with decreased accuracy or not coded at all. This is done at the cost of losing some irrelevant data but maintaining the audible quality of the music. Although the nature of audio waveforms makes them generally difficult to simplify without a (necessarily lossy) conversion to frequency information, as performed by the human ear. As values of audio samples change very quickly, so generic data compression algorithms without spectrum analysis don't work well for audio, and strings of consecutive bytes don't generally appear very often. The common lossy compression standards include MP3, VQF, OGG and MPC. Sony minidiscs use a standard by the name of ATRAC [Adaptive TRansform Acoustic Coding].
Compression efficiency of lossy data compression encoders is typically defined by the bitrate, because compression rate depends on bit depth and sampling rate of the input signal. Nevertheless there are often published audio quality which use the CD parameters as references (44.1 kHz, 2×16 bit). Sometimes also the DAT SP parameters are used (48 kHz, 2×16 bit). Compression ratio for this reference is higher, which demonstrates the problem of the term compression ratio for lossy encoders.
The focus in audio signal processing is most typically an analysis of which parts of the signal are audible. Which parts of the signal are heard and which are not, is not decided merely by physiology of the human hearing system, but very much by psychological properties. These properties are analyzed within the field of psychoacoustics. It is necessary to exploit psychoacoustic effects to determine how to reduce the amount of data required for faithful reproduction of the original uncompressed audio to most listeners. This is done by conducting hearing tests on subjects to determine how much distortion of the music is tolerable before it becomes un-audible. Another technique is to break the music's frequency spectrum into smaller sections known as subbands. Different resolutions can then be used in each subband to suit the respective requirements. However, the computational complexity of these compression methods is extremely high, costly and difficult to implement.
MP3 enjoys very significant and extremely wide popularity and support, not just by end-users and software, but also by hardware such as DVD players. The bit rate, i.e. the number of binary digits streamed per second, is variable for MP3 files. The general rule is that the higher the bitrate, the more information is included from the original sound file, and thus the higher the quality of played back audio. Bit rates available in MPEG-1 layer 3 are 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 Kbit/s, and the available sampling frequencies are 32, 44.1 and 48 KHz. 44.1 KHz is used as the sampling frequency of the audio CD, and 128 Kbit has become the de facto “good enough” standard. Many listeners accept the MP3 bitrate of 128 kilobits per second (Kbit/s) as faithful enough to original CDs, which provides a compression ratio of approximately 11:1. Although listening tests show that with a bit of practice, many listeners can reliably distinguish 128 Kbit/s MP3s from CD originals. To some listeners, 128 Kbit/s provides unacceptable quality.
The MPEG-1 standard does not include a precise specification for an MP3 encoder. The decoding algorithm and file format, as a contrast, are well defined. As a result, there are many different MP3 encoders available, each producing files of differing quality. Most lossy compression algorithms use transforms such as the modified discrete cosine transform (MDCT) to convert sampled waveforms into a transform domain. Once transformed, typically into the frequency domain, component frequencies can be allocated bits according to how audible they are. Audibility of spectral components is determined by first calculating a masking threshold, below which it is estimated that sounds will be beyond the limits of human perception.
As the example depicted in FIG. 1, depicted in the paper titled “Lossless Wideband Audio Compression: Prediction and Transform” by Jong-Hwa Kim, MP3 uses a hybrid transform scheme to transform a time domain signal into a frequency domain signal using a 32 band polyphase quadrature filter, 36 or 12 Tap MDCT (size selected independent for subband 0 . . . 1 and 2 . . . 31), and alias reduction post-processing. The MDCT is a Fast Fourier-related transform (FFT) based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped so as to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. However, the computational complexity of FFT requires O(n2) operations (where n is the data size). Even if deploying the preferred butterfly structure of FFT, the computational complexity is still as high as O(n log n).
In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band polyphase quadrature filter (PQF) bank. The output of this MDCT is post-processed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a hybrid filter bank or a subband MDCT.
Another prior art problem is latency. Since most of the audio compression standards, e.g., MP3, require frequency analysis to ensure that the parts it removes cannot be detected by human listeners, by modeling characteristics of human hearing such as noise masking. This is important to gain huge savings in storage space with reasonable and acceptable (although detectable) losses in fidelity. The FFT frequency analysis is necessary for determining which subbands are more important than others so more data should be removed thereform. However, the frequency analysis using FFT takes time to accumulate audio samples to obtain frequency spectrum thereby determining the importance of different subbands and treating accordingly. This approach is extremely time consuming and counterproductive to real-time audio processing.
Data sets, e.g., audio data, without obviously periodic components cannot be processed well using Fourier techniques. One feature of wavelets that is critical in areas like signal processing and compression is what is referred to in the wavelet literature as perfect reconstruction. A wavelet algorithm has perfect reconstruction when the inverse wavelet transform of the result of the wavelet transform yields exactly the original data set. Wavelets allow complex filters to be constructed for this kind of data, which can remove or enhance selected parts of the signal. Wavelet transform (WT) or subband coding or multiresolution analysis has a huge number of applications in science, engineering, mathematics and information technology. All wavelet transforms consider a function (taken to be a function of time) in terms of oscillations, which are localized in both time and frequency. All wavelet transforms may be considered to be forms of time-frequency representation and are, therefore, related to the subject of harmonic analysis. An article titled “Wavelets for Kids—A Tutorial Introduction” by Brani Vidakovic and Peter Mueller pointed out important differences between Fourier analysis and wavelets including frequency/time localization and representing many classes of functions in a more compact way. While Fourier basis functions are localized in frequency but not in time, wavelets are local in both frequency/scale (via dilations) and in time (via translations). For example, functions with discontinuities and functions with sharp spikes usually take substantially fewer wavelet basis functions than sine-cosine basis functions to achieve a comparable approximation. Waslets' sparse coding characteristic makes them excellent tools for data compression.
In numerical analysis and functional analysis, the discrete wavelet transform (DWT) refers to wavelet transforms for which the wavelets are discretely sampled. DWT are a form of finite impulse response filter. Most notably, the DWT is used for signal coding, where the properties of the transform are exploited to represent a discrete signal in a more redundant form, such as a Laplace-like distribution, often as a preconditioning for data compression. DWT is widely used in handling video/image compression to faithfully recreate the original images under high compression ratios due to its lossless nature. DWT produces as many coefficients as there are pixels in the image. These coefficients can be compressed more easily because the information is statistically concentrated in just a few coefficients. This principle is called transform coding. After that, the coefficients are quantized and the quantized values are entropy encoded and/or run length encoded. The lossless nature of DWT results in zero data loss or modification on decompression so as to support better image quality under higher compression ratios at low-bit rates and highly efficient hardware implementation. U.S. Pat. No. 6,570,510 illustrates an example of such application. Extensive research in the field of visual compression has led to the development of several successful video compression standards such MPEG 4 and JPEG 2000, both of which allow for the use of Wavelet-based compression schemes.
The principle behind the wavelet transform is to hierarchically decompose the input signals into a series of successively lower resolution reference signals and their associated detail signals. At each level, the reference signals and detailed signals contain the information necessary for reconstruction back to the next higher resolution level. One-dimensional DWT (1-D DWT) processing can be described in terms of a filter bank, wavelet transforming a signal is like passing the signal through this filter bank wherein an input signal is analyzed in both low and high frequency bands. The outputs of the different filter stages are the wavelet and scaling function transform coefficients. A separable two-dimensional DWT (2-D DWT) process is a straightforward extension of 1-D DWT. Specifically, in the 2-D DWT image process, separable filter banks are applied first horizontally and then vertically. The decompression operation is the inverse of the compression operation. Finally, the inverse wavelet transform is applied to the de-quantized wavelet coefficients. This produces the pixel values that are used to create the image.
DWT has been popularly applied to image and video coding applications because of its higher de-correlation WT coefficients and energy compression efficiency, in both temporal and spatial representation. In addition, multiple resolution representation of WT is well suited to the properties of the Human Visual System (HVS). Wavelets have been used for image data compression. For example, the United States FBI compresses their fingerprint data base using wavelets. Lifting scheme wavelets also form the basis of the JPEG 2000 image compression standard. There are a number of applications using wavelet techniques for noise reduction. An article titled “Audio Analysis using the Discrete Wavelet Transform” by Tzanetakis et al. applied DWT to extract information from non-speech audio. Another article titled “De-Noising by Soft-Thresholding” by D. L. Donoho published in IEEE Transaction on Information Theory. V41 p613–627, 1995 applied DWT with thresholding operations to de-noise audio signals.
One of big advantages of DWT over the MDCT is the temporal (or spatial) locality of the base functions with the smaller complexity O(n) instead of O(n log n) for the FFT. Comparing with MDCT of MP3, the computational complexity of DWT requires only O(n), since it concerns relative frequency changes, rather than absolute frequency values. Secondly, the DWT captures not only some notion of the frequency content of the input, by examining it at different scales, but also captures temporal content, i.e. the times at which these frequencies occur.
There is a need for a better audio compression scheme via DWT, which provides faithful reproduction of music closer to real-time (less or no latency).