A typical processing system with video/graphics display capability includes a central processing unit (CPU), a display controller coupled with the CPU by a system bus, a system memory also coupled to the system bus, a frame buffer coupled to the display controller by a local bus, peripheral circuitry (e.g., clock drivers and signal converters), display driver circuitry, and a display unit. The CPU generally provides overall system control and, in response to user commands and program instructions retrieved from the system memory, controls the contents of graphics images to be displayed on the display unit. The display controller, which may for example be a video graphics architecture (VGA) controller, generally interfaces the CPU and the display driver circuitry, exchanges graphics and/or video data with the frame buffer during data processing and display refresh operations, controls frame buffer memory operations, and performs additional processing on the subject graphics or video data, such as color expansion. The display driver circuitry converts digital data received from the display controller into the analog levels required by the display unit to generate graphics/video display images. The display unit may be any type of device which presents images to the user conveying the information represented by the graphics/video data being processed. The "display" may also be a printer or other document view/print appliance.
The frame buffer stores words of graphics or video data defining the color/gray-shade of each pixel of an entire display frame during processing operations such as filtering or drawing images. During display refresh, this "pixel data" is retrieved out of the frame buffer by the display controller pixel by pixel as the corresponding pixels on the display screen are refreshed. Thus, the size of the frame buffer directly corresponds to the number of pixels in each display frame and the number of bits (Bytes) in each work used to define each pixel. In a standard VGA system, each frame consists of 640 columns and 480 rows of pixels and each pixel is defined by 8 bits, the frame buffer must have a minimum capacity of 307,200 Bytes. For larger displays, such as a 1280 by 1024 display, approximately 1.5 MBytes or more of memory space is required. It should be recognized that the size and performance of frame buffer 104 is dictated by a number of factors such as, the number of monitor pixels, the monitor DOT clock rate, display refresh, data read/write frequency, and memory bandwidth, to name only a few.
Most frame buffers are constructed from random access memory devices (RAMs). Currently available RAM devices unfortunately have limitations on their use, mostly as a result of trade-offs that had to be made during their design and fabrication. Primarily due to expense and fabrication yields, RAM manufacturers are limited in the number of storage locations (cells) which can be provided on a single integrated circuit. Further, design tradeoffs must be made in the interests of minimizing the number of data and address pins, minimizing the number of devices required for a given memory system, and of optimizing the width of the data and address ports. For example, a 4 Mbit (0.5 Mbyte) RAM can be arranged as 4 M.times.1 (i.e. storing 4 million 1-bit words), 1 M.times.4, 512 K.times.8, 256 K.times.16, or 128 K.times.32 (storing 128 thousand 32-bit words) device. At the one extreme, the 4 M.times.1 architecture only allows access to a single bit per address thereby necessitating the use of 32 devices to completely service a 32-bit data bus. This construction disadvantageously consumes valuable board space. At the other extreme, a single 128 K.times.32 device can service a 32-bit bus however the overall word storage capacity is relatively small and each chip/package requires 32 data pins alone along with 17 additional address pins (not to mention power, control, and feature pins). The need for a total of 39 data and address pins increases the size of the chip (as well as its package) due to minimum size requirements on each connection between the chip and its package and the need for level translator (driver) circuits to drive each such connection. As a consequence, RAM manufacturers have generally adopted the more practical architectures, such as the 256 K.times.16 architecture. Even with the 256 K.times.16 architecture however two devices are still required to service a 32-bit bus (or four to service a 64-bit bus) and each device still requires 18 address and 16 data pins for a 256 K deep memory (which is very limited).
Proposals have been put forth to put not only the entire frame buffer on a single chip but to also add the controller to the chip. A single controller/memory device would reduce the required board space and would eliminate the need for interconnection pins entirely. The primary obstacle to implementing these proposals has been the inability to solve the problem of achieving good yield during the chip manufacturing process. A state-of-the-art controller is normally fabricated using random logic circuitry which results in a typical die sort (fabrication) yield of 60-70%. Random logic circuitry is generally not "repairable." A memory however is usually fabricated as an array of rows and columns of memory cells. The repetitive nature of memory arrays allows for columns and rows containing defective cells to be "repaired" by substitution with redundant rows and columns provided on the chip. With the ability to "repair", the yield for memory devices can be increased. Typically however no more than 2-3% of a given array are provided as "repair cells" due to cost limitations. Further, in those cases where the memory cells are divided into blocks, the repair cells are typically not transferable from block to block. Therefore, a substantial number of defects in a block of memory cells normally cannot be repaired even if enough repair cells are available in the overall array. Currently, there are no means for accessing only the remaining operational blocks of the memory and thus the entire chip must be discarded in many cases.
Conventional RAMs (dynamic RAMs) also disadvantageously employ a multiplexed addressing system. During a memory access, row address bits are sent to each DRAM on the address bus and latched into each device address decoder in response to a row address strobe (/RAS). The column address bits and column address strobe (/CAS) are then presented to each DRAM and latched into the corresponding address decoders, after which data can be written to or retrieved from the addressed locations in memory. Besides complicating the timing of the system memory addressing scheme, this process takes two master clocks instead of a single master clock.
Increasingly, controller-memory subsystems are being required to handle multiple types or formats of data simultaneously. For example, a user interface may include a visual display and an audio subsystem. The visual display output may alone be composed of various combinations of different types of display data such as text JPEG, MPEG, and 3D graphics, to name a few possibilities. For each type of display data, the processing and hence supporting memory storage operations will differ substantially. The same is true for audio data, where various types of compressed or formatted data may be processed and mixed into a composite output.
Thus, the need has arisen for an architecture which will allow the fabrication of a controller and associated memory as a single integrated circuit with high yields and thus reduced device cost. In particular, such an integrated device should allow for the processing and storage of different types of data concurrently. Further, in doing so, the device should provide for high bandwidth operation when included in a general or special purpose computing system. Such an architecture should be applicable to memories of differing sizes and an output word arrangements.