As more and more information is being stored and transmitted through computers, satellites, faxes, and other electronic media, the need for highly efficient means of data compression has arisen.
When compressing digital data that was converted from an analog source, such as sound or images, exact mapping of the source data is not required provided that the resulting data has a high quality and fidelity. A higher level of compression results from a slight loss of information.
Vector quantizers have been successfully employed to compress digital data. A vector quantizer for compressing video images divides the image into multiple small blocks, or pixels. Pixels are small subdivisions of the initial image, and thus a picture may be divided into 1000 rows and 1000 columns for a total of 100,000 pixels, for example. Pixels may be grouped together to form a subsection of the picture, and each such subsection would constitute a vector. For example, if a 10.times.10 subsection were employed, a 100 pixel vector would be the input vector to be quantized. The circuit of the present invention operates on these vectors, and the quality of the picture reproduced after being received is directly related to the size of the vector used. The larger the subsection of an image being represented by a vector of a given complexity, the lower the quality of the picture available when the picture is received.
Pictures transmitted using the prior art and the present invention have been in black and white, but color pictures may be transmitted if vectorized into red, green, and blue components of varying intensity or converted into luminance and chrominance.
Before the input vectors can be quantized, a set of vectors must be established which most closely approximate the range of vectors comprising the pictures transmitted. Such a group of vectors is called a "codebook" or a set of "codevectors." One codevector may be a set of all white pixels, while another may be all black pixels, and a third codevector may have the top half of the subsection white and the bottom half black. May other variations are possible. The quantizer stores a sufficient array of codevectors in the codebook such that a close match between the vectors in the image and the codebook is attained.
The image may be reproduced by a system receiving the codebook and the set of indices corresponding to the closest codevector to the input vector. Reproduction of the image consists of replacing each index with the associated codebook vector. The reproduced image does not exactly match the original image because the codebook vectors may differ from the input vectors. The measure of the difference between the codebook vector and the input vector is called the distortion. Distortion may be decreased by using a larger codebook or smaller codebook vectors.
The complexity of the encoding system becomes a major factor in coding data at a low bit rate with an acceptable level of distortion Most implementations of vector quantization have been limited to speech coding since image coding requires much higher throughput rate. Previous solutions, employing one dimensional and two dimensional arrays, result in multiple chips since the implementations require a large number of processing elements. In addition, such implementations also need large input/output bandwidth with the host.
Prior tree search based architectures employ (log N) processing elements and (kN) memory, where N is the number of codevectors and k is the dimension of the codevectors. Each processing element has a pipelined multiplier to compute the L.sub.2 metric (Euclidean distance between vectors). For example, the design in Kolagotla et al., "VLSI Implementation of a Tree Searched Vector Quantizer," Manuscript, University of Maryland, 1990 has external memory to allow processing elements to be modular, while the design in W. C. Fang, et al., "Systolic Tree-Searched Vector Quantizer for Real-Time Image Compression," IEEE Workshop for VLSI Signal Processing, p. 352-361, 1990, has local memory within each processing element to support fast access. The major deficiency of these designs is that they cannot handle large codebooks efficiently; both designs require large I/O bandwidth, and the design in Fang requires large on-chip memory (local memory) in the processing element. Each processing element requires a different amount of memory which increases exponentially, since each level of the tree is mapped onto a processing element. Thus, these designs require multiple chips for large numbers of codevectors, N.
The design presented in Bi, et al., U.S. Pat. No. 4,958,225, is for a tree search algorithm which utilizes hyperplanes to partition the training vectors. For three dimensional arrays in three dimensional space, two dimensional planes may be employed to divide the arrays. For higher dimensional space, the higher dimensional arrays are divided by "hyperplanes," or multi-dimensional planes The Bi design requires computing distances between the input vector and the reference codebook vector in determining the proper hyperplane to partition the vectors, thus making the device computationally expensive and requiring several multiplication operations and multiple processing elements. Additionally, a high level of memory is required to store the hyperplane values and the overall bandwidth is high.
There is no known single processing element implementation in the prior art which can operate at a video rate. A fast tree search based vector quantization algorithm is required to achieve a single processing element implementation at a video rate. Also, intensive multiplication operations which compute the Euclidean distance should be eliminated wherever possible in the search, since multipliers result in high processing element area complexity.
Digital image data transmission of 512.times.512 images with 8 bits per pixel at 30 frames per second requires 63 megabits per second bandwidth without data compression. If vector quantization at 0.5 bit/pixel bit rate is employed, a communication channel bandwidth of 3.9 megabits per second would adequately transmit the image with little distortion. For this spatial domain picture coding, input vectors of size ranging from 16 to 36 have been employed. Each word has up to 12 bits. For coding TV signals using 512.times.480 images, the available time for processing the input vectors with k=16 is 1.184 .mu.s (microseconds) and it increases to 2.368 .mu.s with k=32.
Assuming a rate of r=0.5 bit/pixel, k=32, and 512.times.512 images with 8 bits/pixel at 30 frames per second are employed, data compression requires N=2.sup.rk =2.sup.16 codevectors. Assuming full search is employed, the number of processing elements needed for real-time operations is 2.sup.21. The number of multiplication operations needed for an input vector is 2.sup.21. The resulting architecture must handle 2.sup.34 multiplication operations per frame. Any single processing element implementation cannot perform full search in real-time for the above image data.
For the same image data, the number of processing elements used in prior tree search is 16, and the number of multiplications needed for an input vector is 512. The total number of multiplications per frame is 223, which results in over 240 MOPS (Million Operations Per Second). The ith processing element has memory size of 32.times.2.sup.i words, where 0.ltoreq.i.ltoreq.15, in the prior art tree search architectures. If a single chip implementation is desired, the total size of the on-chip memory will be k.times.N=2.sup.21 words, which is currently infeasible to implement.
Also, if off-chip memory is employed, implementation would require more than 136 I/O pins for data communications alone, assuming that each element of the vector is represented by 8 bits. Thus, known tree search algorithms, which result in multiple chips, are not suitable for a single processing element implementation.
In image processing applications, higher computational requirements arise in order to achieve desired performance with vector quantization. The available time for encoding an input vector increases as the dimension of the codevectors increases, assuming a source with fixed scalar throughput rate.
It is therefore one object of the present invention to establish a new tree search algorithm having less computational complexity for a single processing element implementation such that the processing element can operate at input data rate.
It is a further object of the present invention to provide a video image reproduction device which can be implemented using a single VLSI (Very Large Scale Integration) chip.
It is another object of this invention to utilize significantly less memory to store the search information compared with conventional search algorithms for vector quantization.
It is a further object of this invention to provide hardware architecture suitable for VLSI implementation based on the proposed tree search.