This invention relates to a vector quantization (VQ) coder for speech, video and the like, and more particularly to a system for extracting vectors from the source using distributed pixel blocks for images, or distributed samples for speech, as vectors. Owing to the correlation between distributed vectors, the encoding search time and the bit rate for the distributed vector quantization (DVQ) coder will be less than for the conventional VQ coder.
Vector quantization (VQ) has existed as a source coding technique for a long time, has been extensively used for speech for some years, and for video images more recently. In accordance with the rate distortion theory, vectors instead of scalars are quantized in a VQ coder to achieve better compression.
The input vectors are usually formed from the source using spatial pixel blocks for images or a set of neighboring samples (temporal blocks) for speech, and a precomputed set of representative vectors is stored in what is called a codebook. The same precomputed set of representative vectors is entered in a decoder codebook at the same corresponding addresses as in the encoder codebook. Each block of source samples of a signal to be encoded, called an input vector, is then compared with the vectors stored in the encoder codebook to find the stored codeword which best matches the input vector. The sequence number or address of this "best-match" codeword, referred to hereinafter as the index, is then emitted as the VQ code for the input vector. At the decoder, the VQ code is simply used as an address to look up the vector codeword. An overview of vector quantization has been presented in "Vector Quantization," by R. M. Gray, IEEE ASSP Mag., Vol. 1, No. 2, April 1984, pp. 4-29.
Traditionally the input vectors for VQ coding have been extracted from the source by choosing K samples (K=dimensionally of the coder) that are immediate neighbors of each other, either in a line for one-dimensional speech sources or in a rectangle for two-dimensional images as shown in FIG. 1(a), which shows for simplicity of illustration a voice input signal sampled 16 times to form an input vector, and an image input signal of 16 pixels arrayed in four rows and columns to form a 4.times.4 input block vector, instead of the usual K=64 or more samples of pixels. This selection of immediate neighbors for the vectors is only natural since immediate neighbors show the highest correlation, therefore the vectors tend to cluster together, which is desirable in a VQ coder.
In the distributed-block scheme of the present invention, the source sequence is decimated by a factor of d before constructing the K-dimensional vectors, as shown in FIG. 1(b) for d=2, and K=4. Note that the normal spatial-block vector for speech is a special case where d=1. For d&gt;1, the source is split into d decimated sequences in a form which may be compared to subband coding where the frequency spectrum of the input signal is separated into a fixed number of adjacent band channels by filtering before encoding the separate channels in parallel. The subband coding channels are multiplexed for transmission and then demultiplexed at the receiver to reassemble the full bandwidth of the signal. R. E. Cochiere, S. A. Webber and J. L. Flanagan, "Digital Coding of Speech in Subbands," Bell Sys. Tech. Jrnl., Vol. 55, No. 8, October 1976, pp. 1069-1085. The distinction of the present invention over subband coding is that in the subband coding technique, the input signal is split into adjacent band channels for coding, such as for a simple example four, while in the present invention a block of 16 samples is decimated into four 2.times.2 (K=4) subblocks as may be readily appreciated from the four decimated subblock vectors of a 4.times.4 image input block shown in FIG. 1(b) for simplicity of illustration. In actual practice, a larger input block, such as 64 voice samples or image pixels would be decimated and distributed into four 4.times.4 subblocks.
Decimation, as the term is used in this context, is carried out by selecting pixels spaced both horizontally and vertically in a block 10 by the factor d, which in this example is 2 for first a subblock 11 and then for subblocks 12, 13 and 14 in sequence. The first decimator for subblock 11 may be implemented by a gate which is turned on for alternate pixels of alternate rows starting with the first row. The second decimator is then implemented by a gate which is turned on for alternate pixels of alternate rows starting with the second pixel of the second row. The third decimator is implemented like the first decimator but starting with the second pixel of the first row, and the fourth decimator is implemented like the second decimator but starting with the first pixel of the second row. The subblocks of a decimated input block are assembled in separate registers which are in practice linear but here shown in 4.times.4 blocks for ease of understanding the meaning of the term "decimation."
No information is lost in the decimating process because each decimator processes a different delayed 4.times.4 set of the source block sequence. Each of these subsets (subblocks) is processed separately in the search for the best match with stored vectors in the distributed vector quantization (DVQ) codebook. The indices of the best matches found in the codebook for the subblocks of an input block are assembled to form a DVQ code. However, as will be made more clear with reference to FIG. 2, and more particularly with reference to FIG. 3(a), a full search is made only for the first subblock; only a partial search of limited range oriented around the index of the first subblock is conducted for the next subblock, and the index of the best match found is then generated relative to the limited range and its orientation around the full search index. Each subsequent subblock is similarly processed with only a partial search. In that manner, the time required to encode the full block and the bit rate of the DVQ code generated are both reduced as compared to conventional VQ coding.
Upon decoding the DVQ code, the absolute full bit codes of the indices are computed for each of the subblocks in order to retrieve from the decoder codebook the same codebook vectors as were found to be the best matches in the encoding process. The codebook vectors are then assembled by inverse decimation and distribution to restore the encoded block with only a possible quantization error in some pixels. An object of this DVQ coding technique is thus to exploit the strong correlation between the distributed input block vector to reduce both the output bit rate and the encoding time per block with only some compromise in signal reproduction quality. A further objective is to facilitate adjustment of the encoder for the optimum compromise between bit rate and quality of reproduction. However, for any compromise adjustment, the decimation technique of the present invention has the desired effect of distributing any annoying quantization errors, thereby making the errors less perceptible, without requiring any extra codebook storage. This is due to the decimation factor. As it is increased, an encoded block appears more and more diffused in the decoded image, and due to the integrating effect of the human eye, blockiness disappears. However, as the factor d is further increased, the correlation between the dimensions of the vector becomes less, the clusters become more spread out, and a larger codebook size is required to achieve the same quality of reproduction as before. Thus, a trade off could be sensed between the dependence on the factor d of the two aspects of decoded image quality, but there will always exist an optimum value of the factor d for best quality within a bit rate. This optimum value is not necessarily d=1, as has been assumed in conventional VQ coding, but rather d&gt;1, and may very well be the factor 2 chosen for the example described with reference to FIGS. 3(a) and 3(b).
It should be noted that while the illustrations used throughout for description of the present invention is an image block of pixels, the input vector block may be readily assembled from a voice signal and processed in the same way with each blocked sequence of samples decimated by a factor d into d subblocks. At the decoder the decimator and distribution process is inverted to reproduce the input blocks of samples with only some sample quantization errors occurring in the encoding-decoding sequence, but again the quantization errors will be dispersed when the rows of the block are strung out to restore the voice signal, and therefore the errors will not be perceptible.