Some memories have inherent limitations in the rate at which data is accessed. For example, if a memory is implemented as a system memory that interfaces with one or more processors as well as one or more cache memories, the inherent limitations introduce latency that hinders processor performance. One such limitation is the requisite minimum time that must elapse before a processor can sequentially access different rows of the same bank. FIG. 1 is a timing diagram 100 showing that the requisite minimum time introduces latency when two transactions sequentially access a single bank, as is common in conventional memories. A single bank, “Bank[0],” is shown to be subject to two separate transactions, each causing access to different rows in memory (e.g., system memory). Typically, a memory controller (not shown) manages the timing of transactions such as these. A transaction (“T1”) 104 requires that an activation signal (“A”) 102 cause Row[i] to open so that the memory controller can access that particular row. But before the controller opens another row, Row[j], to accommodate a subsequent transaction, (“T2”) 110, the memory controller issues a precharge signal (“P”) 106 to close the preceding row before issuing another activation signal (“A”) 108. The requisite minimum time between the openings of different rows of the same bank is depicted as “L1.” So for every such pair of transactions (or “transaction pair”), a duration of L1 introduces latency in servicing processor requests.
FIG. 2 illustrates that the nature of translating a linear address into a row-column-bank (“RCB”) address during which a number of transaction pairs can be generated. A typical processor contains cache memories that are accessed using a linear address 202, which is usually composed of bit groupings arranged as a tag 202a, a set 202b, and an index 202c. An example of such an address format for a specific 64-bit processor shows tag 202a spanning bits b18 through b63 and both set 202b and index 202c spanning bits b6 through 17. Address translation converts linear address 202 into an RCB address 204 having bit groupings arranged as a row, which includes bits b12 through b34 (as both row 204a and row′ 204b), bits b10 and b11 (shown as “b[11:10]” representing a bank 204c), and bits b0 through b9 (as a column 204d).
During execution of program instructions, a processor frequently employs linear addresses 202 in its requests in a manner that causes lower order bits 214 (i.e., row′ 204b, bank 204c and column 204d) to remain substantially the same over sequentially accessed addresses. As processors typically use lower order bits 214 as an index into cache memory to access similarly-located elements, an access request by a processor to place an element in the cache can conflict with another element that is already present at that location. This phenomenon, which can be referred to as “aliasing,” causes like elements with substantially similar lower order bits to be read into and written from a cache. Aliasing exacerbates latency by generating an increased number of “page conflicts,” each of which occur when a pair of transactions causes a memory controller to access different rows of the same bank of memory when bits of bank 204c are the same (i.e., indicative of the same bank). For example, consider that a processor first requests a read transaction, which is immediately followed by a processor request for a write transaction, both of which affect the same entry in cache memory. In cases where a memory operates in conjunction with a write-back, read/write allocate cache, the write transaction will cause data in that entry to be evicted (i.e., forced out) to permit the data returned from the read transaction to be stored at that same entry. And since most modern processors often generate transactions in read/write cycle pairs to different rows of the same bank, each read/write cycle pair can generate a page conflict, thereby giving rise to a corresponding amount of latency, L1, such as exemplified in FIG. 1. For instance, transactions T1 and T2 can respectively be consecutive read and write transactions that traditionally lead to page conflicts.
FIG. 3 is a block diagram illustrating an approach to reducing latency due to sequential accesses to memory that result in page conflicts. In this approach, a mechanism for minimizing page conflicts is implemented during address translation (i.e., after a linear address is translated, but before a RCB address is applied to the memory). As shown, a conventional system 300 using RCB address translation includes a processor 302, a memory controller 308, and a memory 322 as system memory. Memory controller 308 contains an address translator 310 to convert linear address 306 into an RCB-formatted address, such as RCB address 320. To reduce page conflicts, memory controller 308 includes a row characterizer 312 and a bank separator 314. Row characterizer 312 operates on upper order bits 212 to characterize these bits for each RCB address 320. Namely, row characterizer 312 characterizes each incoming access request to avoid situations giving rise to page conflicts, such as when a subsequent address has identical lower bits 214 to an adjacent, preceding address even though upper order bits 212 for both addresses are different. The characterizations of upper order bits 212 are such that memory controller 308 operates to modify RCB address 320 likely to have identical lower order bits 214 so that sequential accesses will generally be to different banks rather than to a common bank of memory. To differentiate the banks, row characterizer 312 first sends the characterizations of upper order bits 212 for the translated addresses to bank separator 314.
Based upon the characterization of the upper order bits 212, as well as translated bank bits 204c, bank separator 314 generates new bank bits for substitution into RCB address 320. The new bank bits for each RCB address 320 are such that the sequential accesses generally relate to different banks, rather than the same bank. By accessing different banks, memory controller 308 avoids latency due to requisite minimum time, L1. Note that path 316 provides new bank bits, whereas path 318 provides translated row and column bits to form RCB address 320. Memory controller 308 has been implemented in the NVIDIA nForce™2 chipset, which is manufactured by NVIDIA Corporation of Santa Clara, Calif.
FIG. 4 is a timing diagram 400 showing sequential accesses to different banks as translated by memory controller 308. A first bank, “Bank[0],” is subject to a preceding transaction (“T1”) 404. Transaction 404 requires that an activation signal (“A”) 402 cause a row in memory 322 to open so that memory controller 308 can access that particular row. To accommodate a subsequent transaction, (“T2”) 408, memory controller 308 issues another activation signal (“A”) 406 to cause another row to open in as second bank, “Bank[1],” rather than Bank[0]. Accordingly, a subsequent transaction (“T2”) 408 can proceed in less than the requisite minimum time, L1. As shown, transaction (“T2”) 408 can occur after time, L2, which is less than L1. Further, as different banks are being used, other transactions Tx 412 and Ty 414 can access the memory without requiring a precharged by signal (“P”) 106, such as shown in FIG. 1. In some cases, Bank[0] and Bank[1] can alternately provide for subsequent read transactions (e.g., such as Tx 412) and subsequent write transactions (e.g., such as Ty 414), respectively, for read-write cycle pairs that typically are issued by processor 302.
Although memory controller 308 does reduce latency, there are several drawbacks in the implementation of memory controller 308. First, row characterizer 312 is in series with address translator 310. As such, row characterizer 312 depends on receiving translated row bits to perform its functionality. Second, with row characterizer 312 in series with address translator 310, three stages are required to generate RCB addresses 320. A critical path is shown as a heavy line passing through three stages, all of which are depicted as encircled numbers. The critical path (i.e., path 317) is a path that includes a series of processes that must be completed so that memory controller 308 can provide RCB addresses 320 for avoiding page conflicts. As such, the last process (e.g., bank separation by bank separator 314) of the critical path dictates the earliest point in time to form such addresses after address translation begins. As shown in FIG. 3, each of address translator 310, row characterizer 312, and bank separator lies in respective process stages on the critical path and therefore each is critical for timely RCB address 320 generation. So although memory controller 308 removes requisite minimum time, L1, its three-stage critical path nevertheless introduces latency into memory operations.
In view of the foregoing, it would be desirable to provide a system, an apparatus and a method for minimizing the drawbacks of minimizing sequential accesses to the same bank of memory, especially by reducing the time that a RCB address is translated from a linear address.