The present invention is related to a multiprocessor system with employment of a memory element in which a DRAM is used in a main memory and a cache. More specifically, the present invention is directed to a mechanism for shortening time defined from an access request for the memory element up to a data arrival (will be referred to as xe2x80x9cmemory access latencyxe2x80x9d hereinafter), and also to a method for transmitting a memory access transection used in this mechanism.
Operation speeds of recently available processors are very high, and these processors are normally provided with built-in type cache memories so as to reduce differences in operation speeds between these processors and external memories. While the built-in type cache memories are hit, processors are capable of executing commands in high efficiencies and thus capable of maintaining high performance. In such a case that either a command or data required to execute an instruction is not present within this built-in type cache memory, and a thus a cache miss occurs, time used to access a memory externally provided with the processor (will be referred to as xe2x80x9cmemory access latencyxe2x80x9d hereinafter) is prolonged, and this time may induce lowering of the performance, since the processor cannot effectively execute the instruction.
Furthermore, since this memory access latency is required so as to check as to whether or not a subject line for cache miss is cached by another processor before executing a memory access operation when a multiprocessor is constituted, generally speaking, there is such a trend that this memory latency is increased, rather than that when a single processor is arranged. The memory access latency may give serious influences to the performance.
While processors containing a plurality of cache memories are coupled to a system bus of the processors so as to constitute a processor node, a description will now be made of a memory access sequence executed in a multi-processor system in which a plurality of processor nodes are coupled to a memory via a network as follows:
(1). A cache miss occurred in a processor, and a memory access request used to access this relevant address is produced.
(2). To send a memory access request to a memory, a processor joins a bus arbitration in order to acquire a use right of a system bus.
(3). If the system bus use right is acquired, then the memory access request produced at (1) is sent to the system bus. At this time, other processors coupled to the system bus check as to whether or not data of an address contained in the memory access request is cached, and then returns a checked result to such a processor which issues the memory access request.
(4). When as a result of (3), the memory access operation must be carried out, the memory access request is sent out to the network. As to sending of the memory access request to the network, there are some cases that the arbitration for obtaining the use right must be carried out similar to (2), depending upon the structure.
(5). A node except for the processor node which has sent out the memory access request receives this request from the network, and checks whether or not a processor provided within this node caches data of the subject address of this request. The node notifies the checked result to the node which has sent out the memory access request.
(6). When as a result of (5), the memory access operation is available, a row address (RAS) and a column address (CAS) are sequentially inputted to the memory so as to perform the memory access operation, so that data access operation is carried out.
(7). The result of data access (6) (data when read system request is issued) is notified to the node which has issued the memory access request, and also is notified to the processor within this node.
Conventionally, the memory first read system has employed. That is, in order to avoid the performance deterioration caused by the memory access latency in the memory access operation executed in accordance with the above-explained sequence, the data reading operation involved in (6) is carried out before other processor checks the cache state in (3) and (5) so as to hide the time required to read the data. As to this memory first read system, the below-mentioned methods have been proposed, depending upon starting timing of data reading operation.
In the method (will be referred to as xe2x80x9cfirst prior artxe2x80x9d hereinafter) described in U.S. Pat. No. 5,778,435, the memory access operation is commenced before the cache miss occurs in the cache built in the processor. In this method, the address of the built-in cache miss is subsequently predicted by the address series of the built-in cache miss to be used.
In the method (will be referred to as xe2x80x9csecond prior artxe2x80x9d hereinafter) disclosed in U.S. Pat. No. 5,987,579, in such a case that the cache access miss occurs in the cache built in the processor and then the address for accessing the memory is sent out to the processor bus, this address is subdivided into an RAS and a CAS, and upon receipt of the RAS, this address is outputted to the memory before receiving the checked result of the cache state. The control operation is carried out as to whether or not the CAS is outputted in response to the checked result of the cache state, and the data access operation for the memory is controlled.
Since the data first reading operation is performed based on the address prediction in the above-explained first prior art, in such a case that the address when the actual cache miss occurs is different from the predicted address, the data must be canceled based upon the prediction, and the data must be reread by the address based on the actual cache miss. As a result, there are the following problems. That is, the throughput of memory is consumed. Also, while the address series of the past cache miss must be stored in order to predict the address, the mechanism for predicting the subsequent cache miss address from this stored address series is required, resulting in the complex system.
Also, there is another problem related to the first prior art. That is, both processors read the data before the access to the memory is permitted. In the case that a plurality of memory access requests with respect to the same address are processed, there is a further problem. That is to say, the ordering process of these plural requests becomes complex, and also the ordering process of the first-read data becomes complex, resulting in a difficult control.
Since all of the addresses are required to check the cache state in the above-explained second prior art, the address subdivided into both the RAS and the CAS must be distributed to all of the processors. Thus, there is such a problem that the busy rate of the address path is increased.
To solve the above-described problems, an object of the present invention is featured by achieving the following items as to a memory access operation:
(1). A memory throughput is not consumed.
(2). A control for ordering requests and a control for ordering data can be made simple.
(3). A busy rate of an address path is not increased.
(4). A memory latency is reduced.
To achieve the above-explained object, in accordance with the present invention, a memory access request in connection with a cache miss of a cache built in a processor is separated into an ACTV comment and a memory access command. The ACTV command activates a memory in advance without executing a data access operation. The memory access command is used to actually read/write data from the memory.
Now, a description will be made of timing at which the respective ACTV command and memory access command are issued. The ACTV command is issued when an address to be accessed is outputted from a processor, precisely speaking, in such a case that the address to be accessed is decoded, and then such a node is judged, to which a memory having data of this address is connected. The ACTV command is transferred to the judged target node by way of an one-to-one transfer. Since this ACTV command causes only RAS to be inputted into the memory, no data transfer between the memory and the ACTV command occurs. Also, this ACTV command corresponds to such a command which is issued to the memory without being adversely influenced by address coincident checking between this ACTV command and the preceding memory access command, and a cache united control.
Also, a memory access command is transferred to a target node in an one-to-one correspondence in the case that an address to be accessed is outputted from a processor, cache states in another processor and a node are checked, and there is no possibility that other nodes are cached. When there is such a possibility that other nodes are cached, the memory access command is distributed to all of the nodes. In the node which receives this memory access command, a process operation required to check the cache unity is carried out, and then, the checked result is returned. Since the results of cache united controls are totalized, the cache states in the all nodes are judged. The memory node which receives both the ACTV command and the memory access command judges as to whether or not the data should be transferred from the memory under cache states of all of the nodes, and reads the data from the memory if necessary.
As a consequence, since the input operation of RAS with respect to the memory is carried out by way of the one-to-one transfer without waiting for the result of the cache united control, the time defined from issuing of the memory request for the processor up to accessing of the data can be shortened, while suppressing an increase in a busy rate of an address path.