With the growth of memory densities, it is becoming more evident that the number of input/output (I/O) pins is a significant limiting factor. This is illustrated by examining the growth of video random access memories (VRAMs) from 1 Mbit per chip to 4 Mbits per chip. A typical 1 Mbit VRAM uses a 28 pin package. However, a proposed standard for a 4 Mbit VRAM employs a 64 pin package, an increase of 36 pins. Not only does this require a larger physical size for the memory chip, but it also creates problems for higher density memory chips, they are developed.
Previously, one major advantage that accrued from the use of high density memory chips, was that the reduced memory chip size provided more board space. However, when the board space required for four, 1 Mbit VRAMs is compared to one 4 Mbit VRAM, the total required board space is approximately the same. Furthermore, with the ongoing development of memory chip technology, 16 Mbit memory chips are in sight followed by 64 Mbit memory chips. If the trend continues of increasing I/O pins with memory chip size, then a 16 Mbit VRAM will require 128 pins just for data ports (64 for random I/O ports and 64 for serial I/O ports). The problem of increasing I/O pin requirements is even more evident when one considers a 64 Mbit memory chip where 512 data pins will be required.
One solution to the problem is to limit the number of data pins, and increase the number of row and column memory modules. For example, since a 4 Mbit VRAM is configured as 512.times.512.times.16, a 16 Mbit VRAM may be configured as 1024.times.1024.times.16, not as 512.times.512.times.64. If this method is used, then the I/O pin count will remain approximately the same, however, the provision of a larger memory cell array within a memory chip inherently lessens the interleavability of the system design.
Consider a frame buffer system which has 1024.times.1024 resolution with 16 bits per pixel. Such a buffer requires either four, 4 Mbit VRAMs (512.times.512.times.16) or one 16 Mbit VRAM (1024.times.1024.times.16). The maximum throughput of the frame buffer is limited to the Fast Page Mode access bandwidth of one VRAM. If the frame buffer design is implemented using 4 Mbit VRAMs, then 4 such VRAMs are required but can be 4-way interleaved for added performance. Assuming that a local workstation can keep up with the frame buffer throughput, the maximum performance of such a frame buffer is four times the performance of each VRAM (since 4 VRAMs can be accessed simultaneously). If Fast Page Mode cycle timing is the same for both 16 Mbit and 4 Mbit VRAMs, then a frame buffer implemented with the smaller VRAMs has a higher performance capability than one implemented with the larger VRAM.
The serial output port of the VRAM also has a similar problem. If a single 16 Mbit VRAM is used for a 1024.times.1024.times.16 frame buffer, then its serial output throughput must be at least as great as that of the video bandwidth for the monitor of that resolution. However, typical VRAMs currently exhibit a serial bandwidth of approximately 33 Mhz. A 60 Hz, 1024.times.1024 resolution monitor requires at least a 60 Mhz video data rate. It is therefore evident that the serial output performance of a high performance VRAM must be improved.
One solution to the serial output bandwidth constraint is to parallel the serial outputs on a VRAM. This however increases the number of I/O pins on the memory chip and is to be avoided if possible.
Image data compression/decompression has been employed to improve the performance of VRAM image buffers. An advantage of using compression and decompression of images is that the storage required to record the images at the source is reduced. In addition, the bandwidth required to transfer the images is reduced.
A favored compression algorithm is a block truncation method that is described in detail by Healy et al. in "Digital Video Bandwidth Compression Using Truncation Coding", IEEE Trans. Comm., COM-9, Dec. 1981, pp. 1809-1823. It provides high quality text and graphic image decompression and reasonable quality, television-like natural images. The compression method per se is not directly relevant to this invention and only certain aspects of it will be reviewed.
The basic idea of the algorithm is to represent each 4 by 4 region of pixels (48 bytes, assuming 3 bytes per pixel) by two colors (3 bytes each) plus a 16-bit wide MASK. The two colors are calculated statistically to best represent the distribution of colors in the 4.times.4 pixel region. The two colors are called HI color and LO color. Each mask bit determines whether the corresponding pixel should get either a HI or LO color. When the MASK is `1`, then the corresponding pixel gets the HI color; and when it is `0`, then the corresponding pixel gets the LO color. This is illustrated in FIG. 1, which shows the bit mapping of a 4.times.4 pixel region 20 to its MASK 22. Since 4.times.4 pixels can be represented by using HI and LO colors (3 bytes each) and a 16 bit MASK (2 bytes), the compression ratio is R.sub.cmp =48/(3+3+2)=6.
The decompression mechanism is simpler than that of compression. For each 4.times.4 pixel matrix, a destination device receives two colors (HI and LO) and the 16 bit MASK. For each bit of the MASK, the corresponding pixel in the 4.times.4 pixel matrix gets either the HI color, if the MASK bit is `1`, or the LO color if the MASK bit is `0`. FIG. 2 shows the compressed data format of an arbitrary 4.times.4 pixel area 24, where each pixel is either one of the two colors, A or B.
In a typical system, data received over the network is temporarily buffered into a FIFO store (first-in, first-out) until it is ready to be stored in a VRAM frame buffer. Such VRAM's are operated in the Fast Page Mode where a memory cycle is typically 50 nS.
It is known that decompression can be performed by storing the compressed data format into a frame buffer and then decompressing the pixel data at the time of video refresh. Another method is to decompress an image prior to storing it into the frame buffer. Although the first method requires less frame buffer memory than the second, it presents problems because the compressed pixel data format cannot easily be used for data manipulation and almost any such operation requires the pixel data to be decompressed first. Also, if the frame buffer stores only a compressed data format, then another frame buffer is needed to store uncompressed images. The solution is to decompress the data prior to storing it into the frame buffer, such that the frame buffer contains only a R, G, B pixel format.
There are a number of problems associated with decompression. The first is that the decompression must be done in real-time in order for the frame buffer not to be the bottleneck in the system. For example, since the MICRO CHANNEL BUS used by IBM PS/2 is capable of transferring 32 bits of data every 100 nS, (thus 16 pixels of information every 200 nS), the frame buffer requires a minimum bandwidth of 80 million pixels/second (16/200 nS) in order for the frame buffer not to be a bottleneck in the system.
A classical solution that improves a memory's bandwidth is to interleave the memory. There are two ways to interleave a memory. One is to access the interleaved memory in parallel such that, in one memory access time, there will be N operations for an N way interleaved memory. The second is to access interleaved memory in a time-serial overlapped manner, such that another memory access to a different module can be started 1/N memory cycle period later for an N-way interleaved memory. In either case, the frame buffer should be designed such that the decompression bandwidth is greater than or equal to the communication network bandwidth so that the frame buffer is not the bottleneck of the system. In order to maximize the bandwidth, each memory module should have an independent data path and separate controls such that all modules can operate in parallel. Notice that as described before, in case of a MICRO CHANNEL BUS, 16 pixels of information can be transferred every 200 nS (16 pixels/200 nS=80 million pixels/second). If 50 nS bandwidth memory chips are used within the memory modules, then N must be at least 4 (4 pixels/50 nS=80 million pixels/second). If N is 16, then a maximum bandwidth of 320 million pixels per second can be achieved (16 pixels/50 nS). Although simple memory interleaving gives the best performance, it does not justify the complexity and cost of having multiple memory modules, each with its own separate data path and controls.
The second problem is that the VRAM must allow non-compression mode access. Non-compressed mode access is also important since compression/decompression is lossy. It is possible that a high quality image is needed, at the cost of lost high performance. Furthermore, a read memory cycle is always a non-compressed mode cycle. Non-compressed mode access is important if the decompressed data is used by the local workstation for image manipulation. The compressed mode access also allows an increase in performance of the local workstation.
The third problem is that for a high resolution monitor, the serial output of the VRAMs must be interleaved to provide the bandwidth necessary for that monitor. Since current VRAMs have serial output bandwidths of approximately 33 Mhz, a typical frame buffer design has serial output ports PG,9 interleaved, depending on the attached display. For example, for a monitor resolution of 1280.times.1024, the video bandwidth is 110 Mhz. Thus, four way VRAM serial output interleaving is sufficient for such resolution. However, for a monitor resolution of 2048.times.1536, the video bandwidth is 260 Mhz. This requires eight-way interleaving, since four way interleaving only gives 4.times.33 Mhz, or 132 Mhz, but 8 ways gives 264 Mhz. The frame buffer design and the decompression design should be able to provide flexible video output bandwidth such that the design is not limited to a monitor's resolution.
The prior art shows a variety of VRAM/image buffer schemes for performance improvement. In U.S. Pat. No. 4,410,965, issued Oct. 18, 1983, entitled "Data Decompression Apparatus and Method" to Moore, there is described a hardware decompression mechanism based on Huffman coding of a bit image. The compression method is accomplished by comparing a column or a line to an adjacent one and setting bits accordingly if the comparison matches or does not match. Run length coding is then performed on the resulting data.
In U.S. Pat. No. 4,492,983, issued Jan, 8, 1985 and entitled "System for Decoding Compressed Data" to Yoshida et al., there is described a method of image compression/decompression based on correlation between a pair of adjacent scan lines. The method is used in a facsimile image transmission.
In U.S. Pat. No. 4,626,929, issued Dec. 2, 1986 and entitled "Color Video Signal Recording and Reproducing Apparatus" to Ichinoi et al., there is described a method of color video signal recording and reproducing using a technique in which both luminance and chrominance signals are time-base compressed by the use of random access memory and then are time-division multiplex recorded.
Other patents describing various VRAM and dynamic random access memory video systems can be found in the following U.S. Pat. No. 4,985,871 to Catlin; U.S. Pat. No. 4,951,258 to Uehara; U.S. Pat. No. 4,764,866 to Downey; U.S. Pat. No. 4,698,788 to Flannagan et al.; and U.S. Pat. No. 4,684,942 to Nishi et al.
In accordance with the above, it is an object of this invention to provide an improved image buffer.
It is another object of this invention to provide an improved image buffer that employs minimal input/output pins.
It is still another object of this invention to provide an improved VRAM image buffer that is particularly adapted to handling compressed image data and is able to decompress such image data directly on the semiconductor chip holding the VRAM structure.