1. Field of the Invention
This invention relates generally to digital signal processing for the transmission of electronic images, and more specifically, to video image compression using a Discrete Cosine Transformation (DCT) and an Inverse Discrete Cosine Transformation (IDCT) such that there is no need for a separate transpose matrix buffer element, and there is no need for space consuming multiple ported memories.
2. Prior Art
Generally, electronic operations on video images are becoming more frequent in the modern world. A video image can represent any combination of photographs, drawings, documents, text or other such audio visual works including those which, when shown in quick succession, impart the impression of motion. Typically, video images are represented by electronic video signals that may also be processed to improve the video quality and/or to remove video defects and errors prior to storage in any of a variety of storage devices, or transmission over communication channels.
A problem exists in the industry because electronic storage of video images consumes large quantities of memory storage and the transmission of video images requires high transmission bandwidth, both of which are costly to provide. For example, to encode everything in a video signal, that is to say transmitting totally uncompressed information for a typical image, transmission rates of 27 megabytes per second would be required. Compressed video only requires 1.15 megabits per second to transmit. Due to the enormous amounts of data involved, what is known in the art as data "compression" is almost always used in the storage and transmission of video images. The high level of information redundancy typical in video images also lends itself well to data compression, and many software and hardware methods have been developed to take advantage of this fact. A system that compresses and decompresses video images, whether implemented in hardware or in software, is known in the art as a video codec (for compressor/decompressor).
While the art contains many compression methods, it is generally economically necessary to use only those compression methods that are recognized as standards. Conformance to standards is one of the requirements for "open" and interoperable systems. Popular compression standards come from the International Organization for Standardization (ISO), the International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), the International Electrotechnical Commission (IEC), the Moving Picture Experts Group (MPEG) which is part of ISO and IEC, and the Consultative Committee on International Telegraphy and Telephony (CCITT). Some industry standards for video images are the ISO-IEC JTC1/SC2/WG11 MPEG standard for "motion video", the ISO-IEC JTC1/SC2/WG8 JPEG standard for "still images" and the P x 64 H.261 recommendation A standard for video conferencing, the [CCITT line transmission for nontelephone signals, a video codec for audio/visual services at P x 64 K bits/second Video Transmission H.261 Recommendation, Annex A 1990], known as "H.261 Annex A".
As noted above, it is often advantageous to "compress" the video signal so that the signal requires less memory to store or takes less transmission bandwidth to broadcast. In many applications, such as in video teleconferencing, the video information must also be compressed to match the maximum number of bits per second available in the communications channel, such as a telephone line. There are many common uses of video signal compression and these include multimedia presentations, live video connections, known as "video teleconferencing", and "video-on-demand" applications. The general rule in compression technology is that the more signal compression that is done, the poorer the final image quality is after transmission or storage.
As an example of the benefit of using video image compression for electronically storing or transmitting a video image, take the case of a square checkerboard array of ten alternate light and dark squares on each side. Thus in this case, there is a total image which contains fifty light squares and fifty dark squares. Each of these squares in this checkerboard image will be assumed to be made up of ten minimum spots of light or picture elements (each one called a pixel) on each side. That is to say each of the one hundred squares consists of ten by ten pixels, or one hundred total pixels per light or dark square. Thus the total picture of the square checkerboard array has a total of one hundred by one hundred pixels which would typically, without compression, occupy either ten thousand memory locations, one location for each pixel image, or ten thousand bytes of the available transmission bandwidth for each image. Compressing this checkerboard image through a reduction in either temporal, spatial, or statistical redundancy, would then reduce the memory space required for this simplified example to only one hundred memory locations, one for each differently shaded area. Thus using video image compress results in a reduction of required memory space (or bandwidth) by a factor of one hundred. That is to say, this is a one hundred-to-one compression ratio in this case. In practice, for teleconferencing applications using the H.261 standard, one thousand to one image compression ratios are obtained. In the MPEG motion picture standard, compression ratios of typically about one hundred to one can be expected. Thus, it's clear that there is a major advantage to be had by compressing the video image data before transmission or before storage.
Most modern video codecs consist of a sequence of hardware components each of which performs some function involved in either compressing or in decompressing the video image. A codec designer chooses specific components based on the design goals of the specific system. By choosing the appropriate components the codec design can be optimized for various factors, such as speed of compression to meet transmission rate goals, reliability of transmission to meet image quality goals, improved color reproduction, better edge definition, higher compression ratios to achieve lower storage space requirements. Thus, there exists a problem in obtaining high quality video images that meet or exceed the industry standards while still achieving sufficient compression to permit cost effective and efficient transmission over the available transmission lines, or storage in the available storage media.
An approach known in the art to decrease the memory storage and high bandwidth requirements of video images is to use data compression techniques to compress video signals before storage or before transmission. A known data compression technique uses the Discrete Cosine Transform (DCT) along with what is known in the art as quantization to compress the image data contained in video signals. To decompress the previously compressed image data uses the Inverse Discrete Cosine Transform (IDCT) and inverse quantization. In general, a video image or picture (known in the art as a frame) is divided into small areas called blocks. Each block is composed of a square area of the picture containing 64 picture elements (known as pixels) in an eight by eight array. The DCT hardware takes the image blocks and compresses the data contained in the pixels by transforming the data into the frequency domain. By performing this transformation, repetitive information, such as a group of uniformly colored pixels may be identified and subsequently removed, thus requiring fewer bits of information to transfer the image data to the display screen. This process is known as spatial compression.
For an example of spatial compression, if the entire block of data was all of one shade and intensity it would only take one single piece of data, known as a coefficient, to completely transfer the image to the receiver. By comparison, an uncompressed image which would require 64 pieces of data for the same image. Typically, to meet the quality standards, only about 5 coefficients need to be transferred, thus resulting in a savings of approximately 5 divided by 64, or roughly a 13 to one compression ratio.
The DCT and the IDCT are matrix operations defined by the frequency transformed summation over all x's and y's of a series of cosine functions given as the product of cos[(2x+1)x.PI./16] times cos[(2x+1)y.PI./16], where x and y are variables representing signal phase. Since all elements of the matrix are cosine functions, the matrix is orthogonal, and can therefore be performed by a sequence of row element additions and multiplications followed by a sequence of column additions and multiplications. The property of orthogonality (meaning that the functions are at "right angles" to each other) of the DCT and IDCT is important to the methods used to achieve a large compression ratio while maintaining sufficient speed of transformation. For example, a 2-Dimensional 8.times.8 point DCT and IDCT matrix transformation may be "decomposed" or broken down into a series of eight 1-Dimensional 1.times.8 point row transforms, whose results are input to a series of eight 1-Dimensional 1.times.8 point column transforms (for a total of sixteen 1.times.8 transforms). This procedure simplifies the mechanics of performing the 8.times.8 pixel (known in the art as a `block` of data) matrix transformation, and decreases the size of the physical circuit necessary to do the calculations. In order to simplify the control structures and overhead required to deal with a series of row transformations followed by a series of column transformations, it is typical to use what is known in the art as a "transpose buffer". The transpose buffer realigns the result of the row transforms into column formatted data appropriate for the following column DCT operations. This dedicated transpose buffer increases the speed of the DCT and IDCT operations, but results in an increase in the overall DCT circuit size and expense.
For a more complete discussion of methods to perform video compression and decompression using the DCT and IDCT see the copending patent application by the same inventor and assigned to the same assignee filed Jan. 17, 1996, application Ser. No. 08/591,204 entitled Method and Apparatus for Video Compression and Decompression using High Speed Discrete Cosine Transform.
With reference to the above discussion, it is clear that the storage and transmission of video signals will benefit from the availability of efficient and inexpensive compression methods. Accordingly, it is a purpose of this invention to provide a method and an apparatus for performing video compression and decompression that will compress the video information to meet the maximum capacity of the transmission medium while still maintaining the highest possible video image quality.