1. Field of the Invention
The present invention generally relates to memory architectures for use in image processing and, more particularly, to a novel memory architecture to achieve minimal rounding and truncation errors for n-dimensional image transformation.
2. Environment
Technological advances in digital transmission networks, digital storage media, and digital signal processing, especially of video signals, now make possible the economical transmission and storage of digital video in a wide variety of applications. Because the storage and transmission of digital video signals is central to many applications, and because an uncompressed representation of a video signal requires a large amount of storage, the use of digital video signal compression techniques is vital to this advancing art. In this regard, several international standards for the compression of digital video signals have emerged, and more are currently under development. These standards apply to algorithms for the transmission and storage of compressed digital video in a variety of applications, including videotelephony and teleconferencing, high quality digital television on coaxial and fiber-optic networks as well as broadcast terrestrially and over direct broadcast satellites, and various media such as compact disk read only memory (CD-ROM), digital audio tape (DAT) and magnetic and magnetooptical disks.
Many n-dimensional image transformations, such as the Discrete Cosine Transform (DCT) and the Fast Fourier Transform (FFT), are widely used in image processing. For example, a two-dimensional DCT (2D-DCT) is widely used in the field of image compression as a method to transform the image to a more compact form. Accordingly, the Joint Photographic Experts Group (JPEG) draft International Standards Organization (ISO) standard 10918-1, the Comite Consultif Internationale Telegraphique et Telephone (CCITT) recommendation H.261 and the Moving Pictures Experts Group (MPEG) ISO/IEC (International Standards Organization/International Electrotechnical Commission) standard 13818-2 standardizes the DCT as their most computation intensive core to compress and decompress image data. However, for applications such as MPEG, the tolerance for computation error is very low because any erroneous image data may be reused over and over again. This explains why the Institute of Electronic and Electrical Engineers (IEEE) CAS Standards Committee submitted the standard proposal P1180 to specify the limitation of error for the implementation of 8.times.8 Inverse DCT.
Because of the decomposable feature of these transformations, consecutive one-dimensional (1D) transformations can be used to perform the n-dimensional transformations. The intermediate results after 1D transformation are normally stored in temporary memory devices before the following 1D transformation starts. For example, this 1D approach can be used to compute the 2D-DCT. That is, to implement 2D-DCT, first a 1D-DCT is performed based on row/column order, the intermediate results are saved and transposed, then a second 1D-DCT is performed, again based on row/column order. All of the intermediate results after the 1D-DCT coefficients are stored in memory. For a typical 8.times.8 2D-DCT, this means a storage size of 8.times.8.times.data size per intermediate 1D-DCT result (e.g., 16-bit).
The data size per intermediate 1D-DCT result is crucial in two ways. First, the size directly ties to physical area if it is desired to implement the memory in Very Large Scale Integrated (VLSI) circuit chips, especially if more expensive memory devices are used to implement transposition as well. Second, the size also ties directly to the throughput especially when Distributed Architecture (DA) is used to implement inner product computation. However, it is still necessary to meet the high accuracy requirement of the Draft Standard Specification for the implementations of 8.times.8 Inverse DCT (P1180) proposed by the IEEE CAS Standards Committee. There is need for a better way to satisfy the strict standard requirement without increasing hardware cost. In other words, a good memory architecture is needed to effectively utilize the intermediate computation storage that can avoid bits dropping.
3. Description of the Prior Art
A brief summary of published articles which describe a number of schemes of adaptive quantization and bit-rate control may be found in U.S. Pat. No. 5,231,484 to Gonzales et al. assigned to the International Business Machines Corp. The Gonzales et al. patent discloses a specific system which implements an encoder suitable for use with the proposed ISO/IEC MPEG standards. This system operates to adaptively pre-process an incoming digital motion video sequence, allocate bits to the pictures in a sequence, and adaptively quantize transform coefficients in different regions of a picture in a video sequence so as to provide optimal visual quality given the number of bits allocated to that picture. Gonzales et al. however do not address the need for a memory architecture to effectively utilize the intermediate computation storage that can avoid bits dropping in a transform such as the DCT/IDCT transform.
The architecture proposed by Darren Slawecki and Weiping Li, "DCT/IDCT Processor Design for High Data Rate Image Coding", IEEE Transactions on Circuits and Systems for Video Technology, Vol. 2, No. 2, June 1992, simply drops the least significant bits generated in a transform computation. This approach, however, cannot satisfy the strict standard requirements without increasing hardware cost; that is, a larger memory is required (and a longer processing time is required for the second dimension operation) in order to provide the accuracy necessary to satisfy the standards.