1. Field of the Invention:
The present invention relates to the field of memory apparatus in digital systems. Specifically the present invention relates to the time saving technique of detecting both cache mode and fast page mode in the same clock cycle.
2. Art Background:
In traditional systems design, memory latency--the delay incurred when retrieving data from main memory, is a limiting factor in increasing performance. Cache memories are used to somewhat minimize the time penalties due to accesses to main memory. Cache memories are relatively small, high speed buffer memories used in computer systems to temporarily hold those portions of the contents of memory which are currently in use. Information located in the cache memory may be accessed in less time than information located in main memory. The advantages of cache memories are derived from the property of locality with respect to memory accesses in a computer system. Two types of locality exist, temporal and spatial. Temporal locality reflects the observation that information which will be in use in the near future is likely to be in use already. Spatial locality reflects the observation that address space of information to be accessed is likely to be near the address space of currently accessed information. For further information on cache memories, see Smith, "Cache Memories", Computing Surveys, Vol 14, No. 3, pp 474-530 (September 1982) and Hennessy & Patterson, Computer Architecture A Quantitative Approach, 403-425 (Morgan Kaufman 1990).
System designers have further improved memory latency by taking advantage of DRAM (Dynamic Random Access Memory) technology, specifically fast page mode DRAMs. These DRAMs are used as the main memory and allow for faster access to a memory location provided it is accessing the same row address as the previous memory access. DRAM access times are divided into random access times and column (or fast page mode) access times. Fast page mode DRAMs allow repeated access to the same row, with the benefit of not incurring the RAS precharge and RAS setup delays. Fast page mode DRAMs take advantage of the program behavior known as spatial locality, which describes the tendency of program data to access a narrow region of memory over a given amount of time. (See Hennessy & Patterson, Computer Architecture A Quantitative Approach, pages 431-432 (Morgan Kaufman 1990)). To support fast page mode DRAM accesses, designers have to insure that subsequent memory accesses are to the same row address as the initial memory access. If a subsequent memory access requires a different row to be accessed, an additional delay is incurred while a random memory access is initiated to service the different row address (the additional time being used to precharge the Row Address Strobe (RAS) and for the address setup time between the RAS and Column Address Strobe (CAS)). However, system designers of general purpose processors can not rely on any predictable order of memory access, and therefore must implement a row address comparator, in which each memory access row address is compared to the previous memory access row address. The comparator is located in the memory control unit (MCU). The MCU drives the DRAM control signal lines and determines if the current memory access may take advantage of a fast page mode access, or incur the additional delay of a random access. The fast page mode access capability improves performance by taking advantage of spatial locality; however, a price is paid in terms of the delay incurred by the row address comparator. In a synchronous system, this may add an additional cycle to all memory accesses. Early memory designs tended to accelerate only overlapping memory accesses, and defaulted to a random access mode (in which all DRAM control signals return to inactive) as the default, or idle, state.
Recent high performance memory control designs have improved upon previous designs by implementing the fast page mode access as the default access type. This requires that the row address for each memory access be checked before the access begins, to determine if the correct row is being accessed. The memory controller determines which type of memory access is appropriate before initiating the memory access. In a synchronous design, the comparison requires an additional clock cycle for all memory accesses. However, because fast page mode access is normally two or three times faster than random access mode, a single state or cycle penalty on all memory accesses still increases overall performance over a system that does not implement fast page mode access.
In a memory system containing a cache memory for a memory access, the memory management unit (MMU) first determines if the data being accessed is resident in the cache. If the data is found in the cache, the memory access is satisfied without accessing main memory. If the data is not resident in the cache, the MMU notifies the MCU that access to main memory is required. In a synchronous system, the cache lookup requires one or more states or clock cycles to determine if a main memory access is required. Additionally, if more than one processor is present, or if an I/O subsystem that supports direct memory access (DMA) is present, arbitration for memory access must also take place.
An illustrative computer system is shown in FIG. 1. Shown there is a computer 101 which comprises three major components. The first of these is the input/output (I/O) circuit 102 which is used to communicate information in appropriately structured form to and from the other parts of the computer 101. Also shown as a part of computer 101 is the central processing unit (CPU) 103 and memory subsystem 104. Also shown in FIG. 1 is an input device 105, shown in typical embodiment as a keyboard. It should be understood, however, that the input device may actually be a card reader, magnetic or paper tape reader, or other well-known input device (including, of course, another computer). In addition, a display monitor 107 is illustrated which is used to display messages or other communications to the user. A cursor control 109 is used to select command modes and edit the input data, and in general provides a more convenient means to input information into the system.
The memory subsystem 104 comprises a memory management unit (MMU) 112, a memory control unit (MCU) 114, a cache 116, main memory 118, and an input/output interface 110 which connects to the mass memory 106. Mass memory 106 is connected to the computer 101 as a peripheral device and may be a disk drive, tape drive or the like. In the present illustration, the main memory 118 is a DRAM which provides for fast page mode access.
MMU 112 receives a data request from the CPU, performs any address translation from virtual to physical that is needed, and determines whether the data is located in mass memory 106, in main memory 118 or in the cache 116. If the data is located in the cache 116, a signal is sent to retrieve the data from the cache 116 and return the data to the MMU for transmission to the CPU 103. If the data is not located in the cache 116, a signal is sent to the MCU 114 to retrieve the requested data from main memory 118. The MCU 114 drives the signal lines (i.e., row, column lines) to access the memory location containing the requested data. If the main memory 118 consists of fast page mode DRAMs, the MCU 114, prior to driving the signal lines, will compare the row address of the data to be accessed with the row address previously accessed. If the row addresses are the same, a quick access of the data can be achieved by executing a fast page mode cycle in which only the column address and CAS are required to access the correct location. If the row addresses are not the same, the MCU 114 must execute a random access cycle and incur the additional delay.
The process flow for accessing data in a cached memory system is illustrated by the flow diagram of FIG. 2 and the signal timing is illustrated by the timing diagram of FIG. 3. A processor memory request 210 is initiated by the processor (i.e., CPU). This request is directed to the Memory Management Unit (MMU) which performs a cache lookup 220 to determine if the data requested is currently located in the cache. If the data is located in the cache, a "hit" occurs and the data is quickly transferred to the processor. If the data is not located in the cache, a "miss" occurs and the process continues by initiating a main memory access request 230 and performing any necessary arbitration (which is needed if the input/output subsystem has the ability to do direct memory access, if the system is a multiple processor system or if the CPU design incorporates separate instruction and data caches, where each cache can independently request a memory access). The main memory access request is directed to the memory control unit (MCU) which performs a row address comparison 240 to determine whether data is located at the same row address as the previous data accessed. If the data is located at the same row address, a hit occurs and the fast page mode access 250 is employed. If a miss occurs a slower random access of memory 260 is performed to access the data requested.