As methods to realize the cache memory apparatus, there are Direct Mapped, Fully Associative, Set Associative etc. In a computer system in which, while a plurality of processors sharing a main memory (main memory apparatus), all or a part of processors have a cache memory apparatus respectively, it is necessary to pay attention to coherency of data between the cache memory apparatus and the main memory.
In order to keep the coherency of data, there are: a technique in which hardware performs control of maintaining data coherency in the cache memory apparatus and the main memory, a technique in which software controls access destination addresses and operation of the cache memory apparatus explicitly, a technique in which hardware and software control operation of the cache memory apparatus in a coordinated manner, etc. By the techniques mentioned above, in particular, by the technique in which hardware controls coherency, a user can program without being conscious of the hardware.
For example, in case a vector processor possessing a structure as shown in FIG. 11 processes a vector store instruction, it is necessary that data in a cache memory apparatus 105 of a scalar processor is consistent with data in a main memory 104. FIG. 11 is a block diagram illustrating a structure which a general vector processor includes.
The vector processor includes a vector processing unit 100, a scalar processing unit 102, a memory network 103 and the main memory 104.
The scalar processing unit 102 performs processing such as: instruction fetch to read an instruction from the main memory 104, decode to interpret the instruction, scheduling to decide a sequence to execute the decoded instructions and instruction execution to execute the instructions in accordance with the sequence. Further, the scalar processing unit 102 directs to the vector processing unit 100 a load instruction to read data from the main memory 104, a store instruction to write data to the main memory 104 and a vector operation instruction etc.
The vector processing unit 100 includes a vector register which holds vector data and an arithmetic unit etc. which process the store instruction and the load instruction etc. respectively. In accordance with the instructions which the scalar processing unit 102 directs, the vector processing unit 100 sends and receives data stored to main memory 104 via the memory network 103.
The memory network 103 transmits data between the scalar processing unit 102 and the vector processing unit 100, and the main memory 104.
The scalar processing unit 102 includes the cache memory apparatus 105 which reduces an overhead of reading data from the main memory 104.
The cache memory apparatus 105 will be explained with reference to FIG. 12. FIG. 12 is a block diagram illustrating a structure of general cache memory apparatus 105 in the vector processor. The cache memory apparatus 105 includes a request issue control unit 200, a flush control unit 202 and a cache unit 201.
The request issue control unit 200, for example, decides a sequence to executes instructions such as a scalar load instruction to read an element from the main memory 104 into the scalar processing unit 102 and a vector store instruction to write a plurality of elements from the vector processing unit 100 into the main memory 104, and issues the instructions according to the sequence.
The cache unit 201 includes an address array 210, a data array 211, a cache hit decision unit 213, and data alignment 214. In the explanation below, it is assumed for convenience of explanation that the cache unit 201 is a write through cache memory apparatus realized by four-way Set Associative.
The data array 211 has the capability to store a copy of data stored in the main memory 104 by units of cache lines. The address array 210 has the capability to store an address of a region storing an original data stored in the data array 211. The address stored in the address array 210 is associated with the data stored in the data array 211. The address array 210 has the capability to associate the address of the data stored in the data array 211 and valid information representing whether or not the address is valid and store them. In case the address is valid, the data array 211 associated with the address stores the data at the address on the main memory 104.
The cache unit 201 decides, by referring to the address array 210 for the instruction which the request issue control unit 200 issues, whether or not the data array 211 stores the data at the address which the instruction accesses.
In a situation to execute a scalar load instruction, in case the data array 211 stores the data at the address which the instruction accesses, the cache unit 201 reads the data from the data array 211 and outputs the data read as read data. In case the data array 211 does not store the data at the address which the instruction accesses, the cache unit 201 performs read processing to read data from the main memory 104 into the data array 211.
In a situation to execute a scalar store instruction, in case the data array 211 stores the data at the address accessed by the instruction, the cache unit 201 stores the data into the data array 211.
The flush control unit 202 includes an address calculation unit 220, a flush address array 222 in an address comparison unit 221 and a flush direction signal generation unit 223.
The flush address array 222 has the capability to store a copy of the address data stored in the address array 210.
The address calculation unit 220 calculates an address to store data in accordance with the vector store instruction which the request issue control unit 200 sends, and sends the calculated address to the address comparison unit 221.
Next, the address comparison unit 221 receives the address and compares the received address and the address stored in the flush address array 222.
In case the address which the address comparison unit 221 received is the same with the address stored in the flush address array 222, the flush direction signal generation unit 223 send a signal to invalidate (to flush) a cache line associated with the address received (hereinafter, shown as “a flush direction signal 203”) to the cache unit 201. In response to receiving the flush direction signal 203, the cache unit 201 invalidates the cache line by writing valid information representing invalid in a state 212.
On the other hand, in case the data array 211 stores a new cache line, and in case of invalidating a certain cache line, the cache unit 201 sends information about the cache line to be changed to the flush control unit 202.
By the flush direction signal generation unit 223 sending the flush direction signal 203 to the cache unit 201 in accordance with the vector store instruction to write data into the main memory, coherency of data is kept.
PTL 1 to PTL 3 disclose technologies which control coherency between data in the cache memory apparatus 105 and data in the main memory 104.
PTL 1 discloses a cache memory apparatus etc. which, in a computer system including a scalar processor and a vector processor, reduces performance degradation caused by processing to maintain coherency of data.
The cache memory apparatus which PTL 1 discloses includes, in order to carry out flush processing, a flush address array which is a copy of an address array. The cache memory apparatus compares whether or not an address stored in the flush address array matches an address referred to in a vector store instruction. In case these addresses match, the cache memory apparatus sends a signal to direct flush.
PTL 2 discloses a cache memory apparatus which processes valid information representing whether or not a flush address array is valid at high speed.
In order to process at high speed, the cache memory apparatus disclosed in PTL 2 stores the valid information mentioned above and includes a memory means capable to carry out read process and write process in 1 machine cycle.
PTL 3 discloses a cache memory control system which has the capability to reduce a required hardware volume of an address array.
In the cache memory control system which PTL 3 discloses, flush processing is performed in units of blocks which gathered a plurality of cache lines. That is, in accordance with update processing of data stored in a main memory, all cache lines in a block containing a cache line possessing a copy of data of the main memory updated by the update processing are flushed.