Computers can generally be broken into three main components: input/output (I/O) for interfacing the computer with external devices (e.g., monitor, mouse, keyboard, modem, etc.), a central processing unit (CPU) for processing data, and memory for storing the data. The dominant type of memory used in most computer systems today consists primarily of dynamic random access memory (DRAM). DRAMs are preferred because of their relatively low cost of production and high storage density. Traditionally, DRAMs were used to store text, computer programs, and numerical data. But as computer systems became faster, more powerful, and more versatile, there was a corresponding requirement to have larger and larger memories to handle the increased volumes of data. Today, there is a huge demand for additional memory in order to satisfy the demands imposed by video, audio, and graphics applications. This multimedia information consumes vast amounts of memory for storage. Fortunately, advances in semiconductor manufacturing processes have substantially increased the capacity of DRAM chips, while costs have dropped on a per byte basis. In the past few years, DRAM chip storage capacity have exploded from storing 256 kbytes, 1 Mbyte, 4 Mbytes, 16 Mbytes, . . . to 256 Mbytes of data. Indeed, the production of 1 Gigabyte DRAM chips is imminent.
However, the speed (i.e., bandwidth) at which data stored in the DRAMS can be accessed has not kept pace with demands. Video and audio recording and playback, three-dimensional graphics generation, real-time teleconferencing, on-the-fly interactive simulations, etc., all require the transfer of huge amounts of data between the processor(s) and memory. Unfortunately, the amount of data which can be accessed from the DRAM is quite limited. This limitation is attributable to the fact that the basic DRAM controller scheme has generally remained the same over the past twenty years. The same scheme that was originally developed for controlling 8 kbyte DRAMs is how being applied to 256 Mbyte DRAMs. What was sufficient twenty years ago is totally inadequate to meet today's technology. A proper analogy is that of a parking lot where the number of parking spaces has increased a thousand fold, but yet there is still only one toll gate through which all cars must pass.
FIG. 1 shows a typical architecture of a prior art DRAM layout. Cell array 101 is comprised of a 128.times.128 array of memory cells. An individual memory cell consists of a transistor which causes a tiny capacitor to be placed in either a charged (i.e., "1") or discharged (i.e., "0") state. Thereby, a single memory cell is capable of being programmed to store one bit of information. Hence, this particular 128.times.128 cell array is capable of storing 16 kbits of data. The memory cells are arranged in rows and columns. Seven address lines (2.sup.7 =128) are used to specify a particular memory cell for access. These seven address lines (e.g., A0-A6/A7-A13) are multiplexed to provide a 14-bit address by using a row address strobe (RAS) signal and a column address strobe (CAS) signal. The RAS signal is used to clock addresses A0-A6 to the row address register 102. The row address decoder 103 decodes the address and specifies one of the 128 rows for access. Similarly, the CAS signal is used to clock addresses A7-A13 to the column address register 104. The column address decoder 105 decodes the address and specifies one of the 128 columns for access. Once a particular cell is specified by decoding its row and column, a read/write (R/W) signal is used to specify whether a bit is to be written into that cell via DATA IN, or the bit retained by that cell is to be read out via DATA OUT.
In the past, designers have sought to increase the bandwidth of their DRAM architecture by implementing wider address and data buses. FIG. 2 shows a prior art memory architecture having wide buses. However, this work-around solution has a couple of drawbacks. First, it requires more board space to physically route the wider buses. Wider buses consume precious area on an already crammed motherboard. Second, wider buses require a corresponding increase in the number of pins for the memory chips and microprocessor. A higher pin count mandates larger chip packages. Again, larger chips consume valuable area on the motherboard. It may be physically impossible to insert these larger chips onto the printed circuit board. The practical limitation of how wide buses can attain is approximately 64 or 128-bits wide. Beyond this bus width, it becomes too unwieldy.
Designers have also attempted to increase the DRAM bandwidth by implementing high speed special DRAMs. Although these specialized DRAMS can achieve relatively high peak bandwidths, it is difficult to sustain these peak bandwidths over time due to the nature of their page misses. Generally, data is stored in a "page" format within the DRAM, whereby an entire page must be "opened" in order to access the piece of desired data residing within that page. If the requested data is not in the currently opened page, a page "miss" occurs. Page misses require a lot of time to service because an entire RAS/CAS cycle must be performed in order to close the current page and open the new page containing the desired data. Hence, page misses severely impact the specialized DRAMs' bandwidth. It is virtually impossible to avoid page misses because the specialized DRAMs typically implement the traditional RAS/CAS scheme. As such, there is minimal or no capability to perform a page open look-ahead due to the fact that the page open (RAS) and read/write (CAS, OE) operations have to be performed in sequence and over the same address bus.
Moreover, since specialized DRAMs have an inordinate number of pins (e.g., 80+ pins) to accommodate their complex interface, there is usually just one single on-chip DRAM controller. This same controller is used to access different types of information. The different types of information are typically stored and accessed from the same DRAM. As a result, there is a relatively high page miss rate as the controller switches between the different types of data. For example, a two-dimensional drawing operation might require different page locations for operands that are required at the same time. Consequently, the DRAM controller normally includes a large FIFO buffer in order to balance the memory accesses with the drawing engine operations. Furthermore, a large percentage of PC Windows applications require rectangular types of operations. A read-modify-write operation is often necessary to determine whether selected pixels are to be changed. These kinds of operations require multiple access to the DRAM (i.e., read and write) and effectively cuts the critical DRAM bandwidth in half.
Thus, there is a need in the prior art for a new high-capacity DRAM architecture that also has a sustainable high bandwidth. The present invention provides an elegant solution by implementing a DRAM architecture having multiple DRAMs with multiple arrays. In the present invention, each of the on-chip DRAMs has its own address, data, and control lines. Hence, the DRAMs can be accessed independently and simultaneously for executing different tasks. Furthermore, in the present invention, each DRAM is divided into multiple arrays, which once opened, stays open. Each of the arrays has its own circuitry that performs page open and circuitry that performs read/write. Hence, page open and read/write operations can be performed simultaneously within the same DRAM. These improvements greatly minimize page misses, thus yielding a much greater DRAM bandwidth. In addition, each memory array is accompanied by byte write enable lines that control which portion of write data is actually updated into the DRAM array. This byte write enable lines can change every clock that in real application converts read-modify-write cycle into write cycle. This reduction of memory access, (from 2 to 1) provides more memory bandwidth for controller to access data.