FIG. 1 is an illustration of a general purpose computer 20. The computer 20 includes a central processing unit (CPU) 22. The CPU 22 executes instructions of a computer program. Each instruction is located at a memory address. Similarly, the data associated with an instruction is located at a memory address. The CPU 22 accesses a specified memory address to fetch the instruction or data stored there.
Most CPUs include an on-board memory called an internal cache. If a specified address is not in the internal, or L1, cache, then the CPU 22 looks for the specified address in an external cache, also called an L2 cache 24. The external cache 24 has an associated external cache controller 26.
If the address is not in the external cache 24 (a cache miss), then the external cache 24 requests access to a system bus 28. When the system bus 28 becomes available, the external cache 24 is allowed to route its address request to the primary memory 30. The primary memory 30 has an associated memory controller 32.
FIG. 2 illustrates a primary memory controller 32 and its associated primary memory 30. The memory controller 32 includes an address queue 50 to store address requests received from the system bus 28. An address from the queue 50 is applied to the bus 52, which routes the address to a row decoder 54 and a multiplexer 56. A strobe control circuit 58 is used to enable either the row decoder 54 or the multiplexer 56. In particular, the row decoder 54 generates a Row Access Strobe (RAS) signal on line 60 or a Column Access Strobe (CAS) signal on line 62.
When an address and the RAS signal are applied to the row decoder 54, the row decoder 54 specifies a row of values in a memory array 64. The row of values, also called a memory page, is then passed into a set of latches 66. Selected columns from the row (or page) of data are then specified with the address signal. That is, a subsequent address signal is used to specify selected columns in the row. The subsequent address signal is used as a multiplexer select signal, enabled by the CAS signal. Thereafter, the multiplexer 56 generates a data output signal on an output bus 68.
The foregoing operation is more fully appreciated with reference to FIG. 3. Waveform 70 illustrates a row address being asserted, followed by a set of column addresses. These signals are applied to the bus 52 of FIG. 2. The same bus 52 is used for both the row and column address in order to save package pins and thereby reduce package costs. The row address (Row Addr) is routed to the row decoder 54 as the RAS signal is deasserted, as shown with the waveform 72 going from high to low. The combination of the row address and the deasserted RAS signal allows the row decoder 54 to access a row in the memory array 64, resulting in the row being driven into the latches 66.
Note that the RAS signal is deasserted after the row address is launched. This time delay is for the purpose of allowing the row address to reach the row decoder. Thus, it can be appreciated with reference to FIG. 3 that there is a delay involved with launching a row address and with deasserting the RAS signal. Thus, it is desirable to reduce the frequency of this operation.
After a row of values is driven into the latches 66, a first set of data is read from the latches 66 with the multiplexer 56, as a first column address and a deasserted CAS signal is received at the multiplexer 56. The first deassertion of the CAS signal is shown with the waveform 74. The first deassertion of the CAS signal coincides with the timing of the first column address, as shown in FIG. 3. This operation results in a first set of output data being driven onto the bus 68. The first set of output data is shown with the waveform 76. The RAS signal continues to be deasserted, shown with the waveform 72, as the subsequent column address signals, shown with the waveform 70, are applied to the multiplexer 56. The subsequent column address signals are timed to coincide with the deassertion of the CAS signal, as shown with the waveform 74. This operation produces three subsequent sets of data. Depending upon the system, the data of the waveform 76 may not be returned until after the second, third, or fourth column address is asserted. A relatively quick return of data is illustrated for convenience.
Note that after the last column address is sent, the RAS signal is asserted (goes high) once again. If a new address is to be fetched from primary memory, the RAS signal must be deasserted again, and the foregoing processing must be repeated.
In FIG. 3, the four blocks of data that are retrieved can be thought of as a single memory word. Sometimes a number of memory words can be retrieved using the same row address. That is, when a subsequent address request refers to the same memory page (or row address), a new set of column addresses can be sent while keeping the same row address asserted. This operation, sometimes referred to as page-mode addressing, results in substantial memory latency improvements because delays associated with the row address signal and the RAS signal are avoided.
Page-mode addressing is frequently available when reading data from a primary memory. In other words, it is commonly necessary to read memory addresses that are located at adjacent positions in memory. This allows for a single row address to be asserted while multiple memory words are retrieved. On the other hand, primary memory write requests tend to be random in nature. In other words, a write back (or copy back) to memory is rarely to the same page as a previous accessed address. As a consequence, primary memory write accesses typically necessitate the assertion of a new row address.
Thus, primary memory latencies can be improved if primary memory write requests minimally interpose primary memory read requests. In such a case, primary memory read requests can enjoy more page-mode access optimizations.
Returning now to FIG. 1, the data output from the primary memory 30 is applied to the system bus 28. It is then stored in the external cache 24 and is passed to the CPU 22 for processing. The processing described in reference to FIGS. 1-3 must be performed for every address request. Indeed, if the address request is not found in the primary memory 30, similar processing is performed by an input/output controller 34 associated with a secondary memory 36.
FIG. 4 illustrates an example of the processing of a set of program addresses by the apparatus described in FIGS. 1 and 2. In FIG. 4, the CPU 22 processes a computer program 80, which includes address requests X1, X2, X3, X4, X12, etc. FIG. 4 also illustrates an external cache 24 with a set of stored addresses X1, X8, X10, X7, and X11. Each stored address has an associated "dirty bit", collectively shown as a column 82. If the dirty bit is set (a digital one value), it means that the data for the associated address has changed while it was in the external cache 24. As a consequence, when the data is removed from the external cache, the updated value must be written back (or copied back) to primary memory 30. On the other hand, if the dirty bit is not set, then a value can simply be deleted from external cache, because the primary memory already stores the same value. The significance of this factor is discussed below.
FIG. 4 also illustrates that the system bus 28 is connected between the external cache 24 and the primary memory 30. The external cache controller 26 and the primary memory controller 30 are not illustrated in FIG. 4 for the purpose of simplicity, but it is understood that they exist. The primary memory 30 includes a set of primary memory pages (or rows) 84A, 84B, 84C, and 84D. Each row includes three memory words. For example, memory page (or row) 84A includes memory words X1, X2, and X3.
FIG. 5 illustrates the traffic that is processed by a primary memory controller 32 of the primary memory 30 when processing the computer program 80 of FIG. 4. The address X1 is the first address request of the computer program 80. As shown in FIG. 4, the address X1 exists in the external cache 24. Thus, the CPU 22 retrieves the data associated with the address and processes it. The next address request in the computer program 80 is the value X2. As shown in FIG. 4, the value X2 does not exist in the external cache 24. Thus, it must be retrieved from primary memory 30. The external cache controller decides that the value X2 should be placed in the external cache 24 at the location currently held by the value X8. Since the value X8 has been changed, as indicated by the dirty bit value set to a digital one, the value X8 must be written back to primary memory. Thus, the external cache controller 26 generates a primary memory write request to the address X8 and a primary memory read request for the address X2. FIG. 5 illustrates that the primary memory controller 32 receives the primary memory write request (WX8) followed by the primary memory read request (RX2). In accordance with the prior art, these values are processed sequentially.
The next address request in the program 80 is the value X3. The value X3 is not in the external cache 24. Since the dirty bit associated with the value X1O has been set, the value X10 has to be written back to primary memory 30. Thus, a write request (WX10) and a read request (RX3) are passed to primary memory, as shown in FIG. 5.
The next address request in the program 80 is the value X4. Since the value X4 does not exist in the external cache 24, the external cache controller generates a read request (RX4) to primary memory. The read request is for the purpose of loading the data for the address X4 into the external cache 24. The data can be loaded in the external cache 24 at the location X7, since the X7 dirty bit has a digital low value.
The final address to be processed is X12. Once again, the value does not exist in the external cache 24, therefore it must be read (RX12) from primary memory 30. Since the value X11 in primary memory has not changed (its dirty bit is set to a digital zero), a write request does not have to accompany the read request RX12.
Returning to FIG. 5, it can be seen that the primary memory controller processes read and write requests in the order that it receives them. It can also be observed that the write request WX10 interposes read requests. Read requests are more critical to primary memory latencies than write requests. Thus, it would be desirable to execute primary memory read requests before primary memory write requests. It would also be desirable to group primary memory read requests for the purpose of achieving primary memory page-mode optimizations.
Returning now to FIG. 1, depicted therein are additional devices connected to the system bus 28. For example, FIG. 1 illustrates an input/output controller 38 operating as an interface between a graphics device 40 and the system bus 28. In addition, the figure illustrates an input/output controller 42 operating as an interface between a network connection circuit 44 and the system bus 28.
The multiple connections to the system bus 28 result in a relatively large amount of traffic. Consequently, there are delays associated with passing information on the system bus 28. System bus 28 delays discourage optimizations of the memory controller 32 that require the passing of information to the CPU 22. Optimization of the memory controller 32 that require the passing of information to the CPU 22 are also discouraged since they typically involve additional signal lines. It is important to reduce the number of pins associated with a CPU package, thus using additional signal lines for memory controller optimizations is not a practical option.
Despite the obstacles preventing improvements to primary memory controllers, it is important to realize improvements in primary memory access times. Primary memory latencies are not improving as much as CPU speeds. Thus, primary memory latencies are increasingly reducing the execution performance of CPUs.
In view of the foregoing, it would be highly desirable to improve the performance of a primary memory controller. In particular, it would be highly desirable to improve the latency associated with primary memory read requests. It would be desirable to achieve this improvement through prioritized primary memory read requests and optimized primary memory page-mode accesses. The performance improvements should not involve additional traffic on the system bus 28, nor should they require additional signal lines into the CPU 22.