1. Field
This disclosure relates to vector quantization for the compression of image, video, speech, audio, or other data types, more particularly to methods for developing codebooks for vector quantization.
2. Background
Data compression techniques attempt to reduce the amount of information necessary to reconstruct the original entity, while still providing enough information to reconstruct the original entity. For example, image compression compresses the amount of data necessary to reconstruct an original image. Speech compression compresses the amount of data needed to compress speech. These of course are examples as compression can be applied to any kind of data.
Vector quantization is a lossy compression technique. Vector Quantization technique (or VQ) partitions the entire data space into a series of representative regions. Within each region, approximations are designated, referred to as codevectors. The regions and codevectors are developed through a training procedure, using typical data sets, such as typical speech patterns or typical images. A typical training procedure was originally proposed in 1980 by Linde, Buzo and Gray and is therefore sometimes referred to as LBG algorithm.
LBG algorithm uses relative occurrences of the patterns in the training images. Typically a large number of training data sets are used for training. Generally, this approach works well for typical patterns. However, rare data combinations may occur that are completely missed by the training set. The resulting codebook will perform very badly when those data combinations occur.
One solution is an approach referred to as Lattice VQ. This approach mathematically partitions the data space into equal regions and includes the rare data patterns. This will have a reasonable performance if the source is also uniformly distributed in the vector space. However, it performs very poorly if the source has a skewed distribution.
Therefore, an approach is needed that performs well with typical data sets, and also with the rare patterns as with the Lattice VQ.
One aspect of the disclosure is a method for data compression. An encoder receives data vectors from the original data to be compressed. The encoder uses a vector quantization codebook to encode the data vectors into encoded vectors. The codebook is produced using a training set having a compound data set, where the compound data set includes real data vectors and artificial data vectors. The encoded vectors are indexed in the codebook and the indexes are transmitted across communication channels or transmitted to storage.
Another aspect of the encoder is the artificial data set. The artificial data set may include a uniformly distributed data set, a diagonal data set, or both.