Computer systems and memory systems are well known, and it has been a continuing struggle to provide memory systems with sufficient capacity and sufficient bandwidth to service their associated CPUs and other system components. In a simple computer system, the CPU accesses either memory (such as semiconductor memory, such as static RAMS or dynamic RAMS), and mass storage space, such as disc drives. With increasing speeds available in modern CPUs and peripheral devices such as graphic controllers and direct memory access (DMA) controllers, more and more manipulation is performed in memory, rather than mass storage, due to the significantly greater access speed, and thus bandwidth, of semiconductor memories. Furthermore, with increasing system speed and a greater number of system components capable of reading and writing to the same memory, memory bandwidth requirements have increased dramatically over time. This increased demand placed upon the memory is particularly acute with respect to graphical applications, which over time has significantly increased in resolution and color depth.
With higher resolution and more color depth, the bandwidth requirement for a graphic memory system is tremendous. Prior art graphic memory system architecture typically consists of two sources competing for the memory bus, as shown in the block diagram of FIG. 1. In order to access that portion of the memory which stores the information associated with screen video, that portion of the memory being referred to as the video frame buffer. These two sources competing for the memory bus with respect to graphical needs are the memory reads required for screen refresh, and the memory read/write cycles initiated by the CPU to access the video frame buffer to update the contents of the video frame buffer (or other graphical device.)
To alleviate the competition for the memory bus, the Video RAM (VRAM) was created, which includes a parallel-in/serial-out data register connected to a second data port, and including its own data clock. This second port allows data to be transferred out of the chip at high speed, for example to control the video display, and occurs in parallel with normal reads from and writes to the VRAM, thereby allowing the CPU or other graphical devices to simultaneously manipulate the data within the VRAM without contention problems. Video RAMS are described, for example, in Computer Graphics Principles and Practice, 2nd ed., Foley, van Dam, Finer and Hughes, Addison-Wesley Publishing Co., 1992, pp. 856-862, Section 18.1.4
DRAM is also usable as a graphic memory when there is an adequate buffer (FIFO) used for screen refresh as well as read ahead and write buffers for the CPU reads and writes. When using DRAM as a graphic memory, buffering is used to allow the DRAM to operate in the page mode as much as possible. When operating a DRAM, a row address is first strobed into the device, followed by a column address, as described in Computer Graphics Principles and Practice, 2nd ed., Foley, van Dam, -Finer and Hughes, Addison-Wesley Publishing Co., 1992, pp. 856-862, Section 18.1.2. A row address defines a plurality of words contained on that row, which plurality of data words can be read out sequentially utilizing a plurality of column addresses, strobes without a requirement for an additional row address strobe. Thus, the plurality of words contained within a row can be quickly read out of the DRAM operating in the page mode (i.e. the words contained in a single ram of the DRAM are all contained on the same "page".) When operating a DRAM the page mode the number of consecutive page cycles possible depends on the depth of the FIFO, (i.e. the number of words within a row which can be stored in FIFO in response to RAS strobe.) Furthermore, when using a DRAM in the page mode, random cycle (RAS strobe) will most likely be required each time a new source takes over the memory bus or there is a switch from a CPU read operation to a CPU write operation.
More recently, in addition to the CPU being capable of accessing the video buffer, as well as the screen refresh circuitry, certain graphic memory systems include a dedicated graphic engine to draw directly into the memory introducing a third source to compete for the memory bus. As depicted in FIG. 2, such a DRAM system 200 includes DRAM 201, memory controller 210, and a plurality of devices interfacing with memory controller 210. Such devices which compete for DRAM 201 bandwidth, include CRT refresh circuitry 221, CPU 222, graphical engine 223, and video drivers 224. Graphic engine 223 can typically operate on two areas of the memory at the same time (a source and a destination), thus these engines effectively add two sources which compete for the memory bus: a source read operation and destination read/write operation. Once again, these graphic memory systems operate in the page mode in order to improve memory bus bandwidth, utilizing FIFOs for each read source or write source. However, switching between sources still most likely will require a random cycle (RAS Strobe), with its attendant decrease in memory bus bandwidth.
Today's graphic systems mix graphic with video, introducing another source to compete for the graphic memory bus bandwidth, the video. Video itself can also be considered as two sources, one to update the video portion of the frame buffer, while the second, depending on implementation, is to fetch the video data to display or to fetch some sort of mask for updating and/or displaying the video data. In addition, today's graphic engines can operate on more than just two maps (source and destination), sometimes up to four maps (source, pattern, mask and destination). As shown in FIG. 2, a dedicated FIFO is required for each source, and more random cycles are encountered with each switch between the possible sources requiring access to the memory bus, effectively reducing the total bandwidth of the graphic memory system.
Due to the high bandwidth high color depth requirements, today's graphic system usually employs a 64-bit bus architecture and requires from 2 to 4 Megabytes of memory. A cost effective memory system should have low chip count, be quite flexible and upgradable. Thus, wide and shallow memories are preferred over narrow and deep memories. To increase bandwidth, faster page cycle is always the most important criteria. Maintaining page cycle is the next most important criteria. Graphic controllers can improve the likelihood that page mode operation can be maintained by the use of multiple (or wider) FIFOs to store more words when a memory row is accessed, but this can be extremely costly, especially when each FIFO is 64-bit wide and there are FIFOs used with each source. Therefore, some major improvement in graphic memory systems is highly desirable.
A DRAM can be accessed in page mode for one row at a time. Each time a new source claims the memory bus, a random access cycle is likely to be needed, since a different page of memory is almost certain to be accessed. This random access cycle is followed by one or more page cycles. The number of page cycles possible once a given page is selected by the random access cycle is the defined by the depth of the internal FIFO for that source of the graphic controller. When only one source is accessing the memory, then the maximum number of page cycles per row can be achieved. This number is limited to the number of columns in the memory array.
Traditional DRAM uses only one sense amplifier per memory core to store the selected row data, as shown in FIG. 2. sense amplifier register register 202, which acts as a cache line for memory array 203, can typically be accessed three times faster than a random access cycle of memory array 203. For a 256k.times.16 DRAM, the cache line size is typically 512.times.16. Since there is only one line of cache available, the hit rate for this architecture is very low especially when multiple row addresses are accessed.
Memory interleaving is one prior art technique that can improve page cycle utilization with the conventional DRAM, by increasing the number of cache lines (FIG. 3). In this example, the memory array is divided into two integrated circuits serving as 4 megabit memory arrays 303-even and 303-odd, respectively, each having a 512.times.16 sense amplifier (row) register serving as a cache line. The two cache lines increase the hit rate significantly when alternate-row address from the two memory arrays are accessed. However, when the two rows of data are accessed from the same memory array, the hit ratio is the same as that of the traditional DRAM architecture.
Memory interleaving can double the maximum number of page cycles per row, since even and odd display lines are assigned to opposite memory banks, and accessing between consecutive lines can remain in page cycle. When there is a page miss, the RAS precharge time can be hidden if accessing alternates banks.
Memory interleaving, however, still has many disadvantages: it requires more memory, multiple accesses to the even or odd display lines still result in page miss, and interleaving still does not address the likelihood of a page miss when there is a change of the source of accessing memory (i.e. graphic controller requests data from two different locations of the even line or odd line bank would likely cause a page miss).
FIG. 4 is a block diagram of a prior art memory circuit including memory controller 210, memory bus 211, and dual page memory 401. Dual page memory includes two memory array banks 401-1 and 401-2, each having an associated sense amplifier register 402-1 and 402-2, respectively. In this prior art circuit 400, dual page memory 401 is formed as a single integrated circuit organized into two discrete memory array banks 401-1 and 401-2, with sense amplifier register 402-1 being dedicated to reading and storing data associated with the first data array page 401-1. Similarly, sense amplifier 402-2 is dedicated to reading and storing information associated with the second data array page 401-2. This prior art memory circuit improves bandwidth somewhat, as a page of memory can be stored from first memory array bank 401-1 for subsequent page reads from sense amplifier 402-1 without interfering with similar page reads from sense amplifier register 402-2 with respect to data obtained from memory array bank 401-2. However, whenever a new page stored within either one memory array banks 401-1 and 401-2 is to be accessed, memory device 401 will have a page miss, requiring a RAS operation to load that new page into the appropriate one of the sense amplifier registers 402-1 and 402-2. An extension of this architecture is to further divide the main core to smaller cores. This reduces the chance of cache "miss". This implementation is very similar to increase the cache size in direct map cache.