Data compression is the process of representing a large amount of data with a smaller amount of data. Data compression is typically used to map a large bandwith signal into a smaller bandwith channel for transmission or storage.
One method of data compression that has gained recent interest is vector quantization. Vector quantization affords a higher compression ratio without sacrificing significant signal quality to the degree required by other data compression methods.
Vector quantization lends itself particularly well to sound or image signals that will be observed and interpreted by the human senses. Human senses require less accuracy than machine interpreted data. For example, the data needed to store and identically reproduce a color video image, displayed on a conventional television, can be compressed significantly without losing the essence of the main image.
To compress a signal by vector quantization, the signal must be in the form of a digital representation. An analog signal is typically digitized by using analog to digital converters. The digital data of the analog signal can then be grouped into blocks that will henceforth be referred to as signal vectors. Vectors can be any N by M group of digitized data elements of the original signal, with N and M being any integer value.
Vector quantization requires a "code book" or its equivalent. The code book contains "code words". Code words are representative vectors which are used to approximate the signal vectors being generated from the original signal.
Typically, the representative vectors or code words are generally created using a probability density function. Several sample signals that can be considered typical of the types of signals that will be compressed, are converted into vectors in the same manner as the signal(s) to be compressed. The probability density function is used to identify those vectors which are most likely to occur and which can be used to best represent all of the signal vectors generated from the signal(s) to be compressed.
Each code word in the code book has a unique associated code. The code has fewer binary elements than the code word itself and fewer binary elements than any signal vector represented by the code word. The smaller in size the code is with respect to the size of the code word, the greater the compression.
Typically the code words will be stored in an electronic memory. The code uniquely associated with each code word can be that code word's unique address in the memory.
After creating a code book of code words, each of the signal vectors generated from a signal to be compressed is compared with the code words to find one code word that best represents the signal vector. Best representation in vector quantization generally means that code word having the least distortion with respect to the signal vector of any of the code words in the code book.
Previously, comparison for vector quantization has been done by a sequential search through the code book. The sequential search compares each signal vector to all code words in the code book. Theoretically, comparison for vector quantization can also be performed by a tree structured search. The tree structured search compares each signal vector to code words in the code book in an order that depends on the result of each comparison. In the tree structured search, comparisons with code words continue to be made along a path down a previously defined branch in the code book if a lower distortion continues to be found.
If N represents the number of code words in the code book, then the number of searches required for each signal vector is N in the sequential case and log.sub.2 N for the tree structured case.
While both the sequential and tree structured searches have their advantages, they also have drawbacks. The sequential search requires that every code word be examined in every case. The tree structured search requires an initial ordering of the code words that assures that the proper code word will always be found. The difficulty of ordering a tree structured search effectively eliminates its applicability to all but the simplest forms of vector quantization involving a very limited number of code words.
Once the least distorted code word is found, the signal vector is then replaced with the code associated with that code word. For example, if the code book contains 256 different code words, then each code word, and thus each signal vector, can be represented by a code consisting of a single, eight-bit binary number or byte.
This method of substituting an abbreviated code identifying a larger, representative vector, is the quantization of the signal vector. If the original signal vector contains sixteen bytes, its representation by a one-byte code provides a sixteen-to-one compression ratio.
A representation of the original signal can be recreated by reversing the procedure using the codes to identify one or a series of code words which in turn represent the original signal vectors. The code word(s) may then be used to generate a decompressed signal approximating the original signal.
Further details regarding the theory and practice of data processing for and compression by sequential and tree structured searching vector quantizers can be found in U.S. Pat. No. 4,560,977 incorporated by reference herein in its entirety.