This invention relates to an improved method and apparatus for compressing data for storage, transmission or other processing, and more particularly to a method and apparatus for improving the accuracy of a multistage data compression arrangement while also reducing memory and computation requirements.
Data compression has been defined as the process of reducing the number of bits necessary to transmit or store information (see Makhoul, John, et al., "Vector Quantization in Speech Coding", Proc. of the IEEE, Vol. 73, No. 11, November 1985, pp. 1551-1568) or the conversion of a stream of very high rate sampled analog (continuous amplitude, discrete time) data into a stream of relatively low rate digital (discrete amplitude, discrete time) data for communication over a digital communication link or storage in a digital memory (Gray, Robert M., "Vector Quantization", IEEE ASSP Magazine, April, 1984, pp. 4-29). In other words, data compression involves the process reducing the amount of data needed to represent (for storage, transmission, etc.) information signals.
The need for new and more powerful methods of data compression continues to grow with the increase in the amount of information which users wish to transmit over or store in existing information transmission and storage facilities. Of course, in addition to seeking to reduce the amount of data necessary to transmit or store a given amount of information, users also desire to maintain the efficiency, security and reliability of the transmission and storage operation.
One approach to implementing data compression is known as vector quantization. Typically, in vector quantization an analog signal source is sampled to form a sequence of digital values representative of the analog signal. Portions or groups of the digital values are then represented as vectors, known as source vectors. The source vectors are compared to a set (codebook) of allowable vector patterns (code vectors) which have been previously determined to be statistically representative and stored. The code vectors all have assigned indices identifying them, and the index of the closest code vector to each source vector (the code vector that minimizes the distortion between it and the source vector) is either stored or transmitted to a receiver which also has a copy of the codebook. Such comparisons thus yield a small array of indices which identify a larger array of source vectors. This process of representing the array of source vectors with a smaller array of indices is vector quantization which, as is evident, achieves data compression.
The indices derived in quantization may be stored, transmitted to another location or otherwise operated upon, after which they are used to retrieve from memory the associated code vectors which represent approximations of the original source vectors. In effect, the retrieved code vectors carry the information (with some distortion) which was to be stored, transmitted or otherwise operated upon.
In one approach to determining the closest code vector, each source vector is compared with every code vector in the codebook. This approach, known as exhaustive search vector quantization (ESVQ), results in the computational and memory burden growing exponentially while the data rate remains constant and the vector length increases (the theory of data compression guarantees better performance with the use of longer vectors at a fixed compression ratio).
A modification of ESVQ, termed residual vector quantization or multistage vector quantization, has been proposed for reducing the memory and computation requirements of the vector quantization process. With multistage vector quantization, a series of quantization stages and codebooks are utilized to produce the array of indices for storage or transmission. Each stage of a multistage quantization system operates on an error or residual vector produced in the previous stage. Although this approach does reduce memory and computation costs, a decline in performance and accuracy of the quantization has also typically resulted. See for example the aforecited Makhoul, et al. reference, page 1576.