1. Field of the Invention
The present invention relates to a memory control apparatus capable of executing a prefetch instruction.
2. Related Background Art
Recently, the CPU speed seems boundless, and increases at an annual rate more than 1.5 times. In this connection, the amount of data transferred in a unit time between the CPU and main storage increases correspondingly. To relax this tendency, by using locality of memory access there has been the technology of increasing the capacity of cache provided in the CPU and configuring it in a hierarchical structure so that a high-speed memory access can be attained. However, it has become more and more difficult to solve the problem of an increasing gap between the operating speed of the CPU and the speed of access to the main storage.
To efficiently solve this problem, it is necessary to drastically speeding up the access to the main storage (memory bandwidth) itself. Currently, main storage of a personal computer (PC) is dynamic RAM (DRAM) which is normally semiconductor memory. Since the speed of the CPU has exceeded the speed attained by the progress of a semiconductor device itself, it is also necessary to attain the high-speed operation of DRAM by an effective circuit configuration or an available efficient system.
In this situation, there have been various systems suggested and put into practice to improve the memory bandwidth. One of the new systems recently receiving much attention is direct Rambus DRAM. With the direct Rambus DRAM, the concept of a channel is adopted to realize a high memory bandwidth of 1.6 GB/sec at maximum per channel.
FIG. 1 shows an example of a transfer protocol for an RSL channel of the direct Rambus DRAM. With the direct Rambus, a packet is configured by four clocks with data transferred at both leading and trailing edges of each clock in one clock cycle.
In FIG. 1, first in cycles 0 to 3, a row packet for activating a page specified by x is issued. The x indicates a set of a device ID, a bank address, and a row address each of which is represented by the specified number of bits. Then, in cycles 7 to 10, a column packet indicating a read of data at the address specified by x0 is issued. The x0 indicates a set of a device ID, a bank address, and a column address each of which is represented by the specified number of bits. The device ID and the bank address are the same as those of x above. The column address specifies an address on a page.
Furthermore, in cycles 11 to 14, a column packet indicating a read of data at the address specified by x1 is issued. The x1 indicates a set of a device ID, a bank address, and a column address each of which is represented by the specified number of bits. The device ID and the bank address are also the same as those of x above. That is, data can be read at two addresses on the same device, band, and page by the set of a row packet and a column packet. In cycles 19 to 22, data corresponding to the first (x0) read command is read from DRAM. In cycles 23 to 26, data corresponding to the second (x1) read command is read from the DRAM.
In the example shown in FIG. 1, data is read from another band concurrently with the series of reading operations. That is, in cycles 8 to 11, a row packet for activating the page specified by y is issued. The y indicates the page located in a bank (in another device or in a non-interference bank in the same device) other than the bank indicated by the preceding x. Then, in cycles 15 to 18, a column packet indicating a read of data at the address specified by y0 is issued. The device ID and the bank address indicated by the y0 are the same as those indicated by the y. Furthermore, in cycles 19 to 22, a column packet indicating a read of data at the address specified by y1 is issued. The device ID and the bank address indicated by the y1 are also the same as those indicated by the y.
In cycles 27 to 30, data corresponding to a third (y0) read command is read from the DRAM. In cycles 31 to 34, data corresponding to a fourth (y1) read command is read from the DRAM. The same operations are performed on z, z0, z1, q, q0, and q1. The activation of a bank z not interfering with the x and y, and the activation of a bank q not interfering with the x, y, and z are performed. Data is read by issuing fifth (z0), sixth (z1), seventh (q0), and eighth (q1) read commands.
The sequential commands perform a pipeline operation for each phase of a row packet, a column packet, and a data packet. Thus, the maximum bandwidth can be obtained when access is gained by a four-stage pipeline in 32 byte units in the direct Rambus DRAM.
Therefore, access in size smaller than 32 bytes reduces an effective bandwidth. For example, 32-byte continuous data can be read at a time faster and more efficiently than the data read twice in 16 bytes units in a divisional manner.
If access occurs frequently in smaller units than 32 bytes, then access efficiency can be improved by providing a prefetch buffer in a memory controller.
FIG. 2 shows an example of a configuration of the conventional memory controller having a prefetch buffer.
In FIG. 2, a memory controller 200 is connected to bus masters 220 to 223 such as a CPU, a DMA controller, a bus bridge, etc. through a system bus 210, and also connected to DRAMs (in this example, direct RDRAM) 230 to 233. The bus masters 220 to 223 are access-adjusted by an arbiter not shown in the attached drawings to avoid an access conflict on a system bus, and to allow only one master to access the memory controller 200.
A data signal (DQA [8:0], DQB [8:0]) line 241 is bidirectional, and transmits data from the memory controller 200 to the DRAMs 230 to 233 during data write, and from the DRAMs 230 to 233 to the memory controller 200 during data read. A row signal (ROW [2:0]) line 242 and a column (COL [4:0]) signal line 243 respectively transmit a row packet and a column packet from the memory controller 200 to the DRAMs 230 to 233. Signals 244 and 245 are clock signals (CTM, CFM) on a channel.
In the memory controller 200, a control device 201 controls an operation timing of each block of the memory controller. Memory channel interface 202 transmits a read/write command on a channel after adapting it to a protocol on the memory channel, and receives data from the channel. A system bus interface 203 is used for connection to a system bus. A buffer 204 temporarily stores a read/write command. A prefetch buffer 205 stores a part of read data and its address as necessary, and transfers stored data to the system bus interface 203. Furthermore, it is provided with a valid flag not shown in the attached drawings but indicating whether or not the stored address data is valid.
Normally, access to memory is locally obtained. When data is read at an address in memory, it is expected at a high probability that data at consecutive addresses can be read within a short time.
When 16-byte data read access occurs from the bus master 220, the memory controller 200 reads 32-byte data including the 16-byte data at a specified address and the 16-byte data at the consequent address from any of the DRAMs 230 to 233. It transfers the specified 16-byte data to the bus master 220 which requested the data through a system bus 210, and stores and holds the remaining 16-byte data in the prefetch buffer 205. When a request to access a held subsequent address is issued from any of the bus masters, the memory controller 200 does not perform a reading operation on the DRAM, but transfers through the system bus interface 203 the data held in the prefetch buffer 205 to the bus master which issued the access request.
For example, assume that the bus master 220 issues a read access request for the 16-byte data stored at address h00120 to address h0012f. If the addresses are assigned to the DRAM 230, the memory controller 200 receives the request, temporarily stores it in the buffer 204, and transmits a read command packet to the corresponding of the DRAM 230 through the memory channel interface 202. At this time, a column packet for a read of the subsequent 16-byte data at the corresponding address is also transmitted. In response to the read commands, after a predetermined delay time the DRAM 230 sequentially transmits to the channel a total of 32-byte data stored at the corresponding address.
When the memory controller 200 receives the 32-byte data transmitted onto the channel by the memory channel interface 202, it transmits the first half 16-byte data from the system bus interface 203 to the bus master 220 through the system bus 210.
On the other hand, it stores the second half 16-byte data in the prefetch buffer 205 with the leading address h00130, and sets a valid flag indicating that the contents of the prefetch buffer 205 are valid. Then, upon receipt of a read access request for the 16-byte data stored at address h00130 (h indicates a hexadecimal number) to address h0013f, it immediately transmits the 16-byte data in the prefetch buffer 205 from the system bus interface 203 to the bus master 220 through the system bus 210. Thus, the efficiency of memory access can be enhanced, and the read latency can be drastically reduced.
If a read access occurs and data at an arbitrary address is stored in the prefetch buffer 205, and a write access occurs at the same address, then the memory controller 200 nullifies the contents stored in the prefetch buffer 205. In the above-mentioned example, if 16-byte consecutive data at address h00130 to address h0013f is stored in the prefetch buffer 205, and the bus master 222 issues a write request for the address range the same as or overlapping the above-mentioned address range, then the memory controller 200 immediately resets the valid flag in the prefetch buffer 205.
Thus, old data is protected from being returned to a bus master, thereby maintaining the consistency of data.
As described above, when a prefetch buffer is provided for a memory controller, the circuit scale is normally enlarged with a number of entries (a pair of an address and data stored in the buffer). Therefore, in the above-mentioned example, only one entry is made. At this time, if there is only one bus master and the master locally accesses data, then a prefetch buffer effectively function. However, if there are a plurality of bus masters, and they alternately access data at different address ranges, the contents of a prefetch buffer are frequently replaced before reference is made, thereby losing the significance of a prefetch buffer.
For example, as shown in FIG. 3, the bus master 220 issues a read access request in the first cycle for 16-byte data at address h1020 to address h102f. As a result, the prefetch buffer 205 stores 16-byte data at address h1030 to address h103f in the fifth cycle.
Then, in the sixth cycle, the bus master 223 issues a read access request for 16-byte data at address h8a40 to address h8a4f. Thus, in the tenth cycle, the 16-byte data at address h1030 to address h103f is not all referred to, but replaced with the 16-byte data at address h8a50 to address h8a5f in the prefetch buffer 205.
Then, in the eleventh cycle, when the bus master 220 issues a read access request for the 16-byte data at address h1030 to address h103f after the data read previously, the data is read from the memory because the data at these addresses are not currently stored in the prefetch buffer 205. Thus, in the fifteenth cycle, the contents of the prefetch buffer 205 are replaced with the 16-byte data at address h1040 to address h104f. Then, when the bus master 223 issues a read access request for the 16-byte data at address h8a50 to address h8a5f in the sixteenth cycle after the data read previously, the data is to be read again from the memory because the data at these addresses are not currently stored in the prefetch buffer 205.
Thus, when a plurality of bus masters are simultaneously operating, the conventional memory controller cannot efficiently control a prefetch buffer having a small number of entries.