The present invention relates to computer systems, and, more particularly, to memory control as used in personal computer systems.
FIG. 1 is a simplified schematic functional/structural block circuit diagram of a personal computer system labelled with reference numeral 100. System 100 includes 32-bit Processor 102, 16 megabyte (MB) DRAM memory 104 made up of more than one hundred 1 megabit DRAM chips with 80 nanosecond (nsec) access times plus fast page mode operation, 64 KB cache memory and controller 106, DRAM memory controller 108, 32-bit Host bus 110, Memory address decoder 112, Address multiplexer 114, Page hit detector 116, 32-bit EISA bus 120, EISA bus controller 122, Integrated system peripheral 124, EISA bus buffer 126 for data transfer between the buses, EISA bus buffer 128 for address transfer between the buses, EISA bus master 130 which typically has some supporting peripheral. DRAM controller 108 contains state machines to track the status of both Host bus 110 and EISA bus 120 and generate control signals. Processor 102 generates 32-bit addresses, but addresses in DRAM memory 104 only use the lowest 24 bits; that is, the 8 bits most significant bits (MSB) are all 0. The higher addresses may be for peripheral memory on EISA bus 120. The clock period of processor 102 is 30 or 40 nsec and thus when using DRAM memory 104 processor must effectively slow down by inserting wait states to allow the DRAM information to be read or written. The addition of cache memory 106 to duplicate selected portions of DRAM memory 104 provides an overall memory performance level approaching that of a large fast memory but with a cost not much more than that of an all DRAM memory. In particular, cache memory 106 stores recently used information (data and instructions) together with the information's address in DRAM memory 104. The locality of reference typically found in computer programs implies that such recently used information will likely be the information used next. Thus when processor 102 puts an address on Host bus 110 for a read, cache memory 106 checks the address to see if the information has been cached (already stored in cache memory); and if it has, then cache memory 106 supplies the information on Host bus 110 within one processor cycle and aborts the read of slower DRAM memory 104.
An alternative and complementary approach of the preferred embodiments to faster memory reads also relies on the locality of reference but to an even greater extent. This alternative simply predicts the address of an access based upon the previous access address. Thus a predicted address may be driven to DRAM memory before Processor 102 buts its actual access address on Host bus 110. A fast comparison of the actual and predicted address then determines whether to use the already-latched predicted address or perform a normal access cycle with the actual address. If the predicted address is used, the cycle can proceed at an accelerated pace.
A related approach in U.S. Pat. No. 5,007,011 determines the difference between two successive access addresses and then predicts that the next address will differ from the current address by the same difference. Similarly, U.S. Pat. No. 4,583,162 partitions memory into even and odd instruction address portions and predicts a next address by incrementing the current address by one which means an odd address engenders an even predicted address which sets up a read in the even memory portion. When the actual next address appears on the address bus a comparison with the predicted address determines whether the predicted read proceeds or whether the CPU pauses to permits the actual address to generate a read. However, this approach has the problems of requiring two addressable memory modules and imposing a penalty for instruction accesses that are not sequential in nature.
The present invention provides an improved system for address prediction for read or write access by incrementing the column address of the preceding address. Memory access is accelerated by automatically incrementing the address at the memory chip inputs, as soon as the minimum hold time has occurred. If the next address actually requested by the CPU does not match this predicted address, then the actual address is driven onto the chip inputs as usual, so essentially no time is lost. However, if the automatically incremented address does match the next actually requested address, then a significant fraction of the chip's required access time has been saved.