1. Field of the Invention
The present invention relates to a cache, flush system and a method thereof.
2. Background of the Related Art
In a general processor system, in order to speed up a processor's access to a main memory, a cache memory temporarily stores data necessary for the processor. Generally, a cache memory maintains management information called a “cache tag” to manage whether data stored in cache block of the cache memory is data among data in the main memory, whether the data stored in the cache block of the cache memory is changed and is in a state having contents that are different from contents of data in the main memory (i.e., modified state or dirty state).
In a multi-processor system including a plurality of processors, a plurality of cache memories exist and each cache memory has a snoop mechanism in order to assure memory coherency or data coherency. The snoop mechanism checks whether processor bus instruction affects data stored in each cache memory, whether the data stored in each cache memory should be returned, etc. and makes corresponding data invalid.
Cache memory includes copy back type and write through type. For the copy back type cache memory, which maintains data during certain period without instantly reflecting data renewal by a processor in a main memory, it is needed to actively write back data changed by a processor, whose contents became different from contents of the main memory, in the main memory. For example, when data stored in cache memory is transmitted to an input/output device not having a snoop mechanism, it is necessary to write back. A cache flush algorithm is used to write data, whose contents are changed among data stored in a main memory, again in the main memory. Further, a cache block in a dirty state is called a dirty block.
The cache flush algorithm is useful for a fault tolerant or replicant system as well as for data transmission to an input/output device not having snoop mechanism. In other words, in case of a check point system that restarts process from the previously obtained check point when one of prescribed system failures occurs, it is needed to write data changed and stored only in cache memory, again in the main memory.
Generally, the cache flush algorithm is performed on the basis of software including cache flush instruction. A processor uses the software to determine whether a cache block is a dirty block referring to contents of a cache tag. If the cache block is a dirty block, the cache flush algorithm that writes data of the corresponding cache block in main memory is performed again.
As discussed above, a processor should perform cache flush algorithm, that is, re-record data in main memory, when a state of cache block is dirty as a result of checking states of all cache blocks. A prescribed amount of time is needed to check states of all cache blocks.
A multi-processor system of the related art will be described with reference to FIG. 1. As shown in FIG. 1, the related art multi-processor system includes a plurality of processors (CPU, Central Processing Unit) 5 connected to a processor bus, a plurality of cache memories 1 connected to each processor, a memory controller 2 connected to the processor bus, a main memory 3 under control of the memory controller 2, and other system resources 4 connected to the processor bus. Each of the processors includes cache memory 1 in one back side of an inside processor and an outside processor or in both back sides. Cache memory of an inside processor is called level 1 cache memory and cache memory of an outside processor is called level 2 cache memory.
Each of the processors 5 is connected to each other through a common processor bus and is able to get an access to the main memory 3 through the processor bus for instruction fetch and loading/storing data. The access to main memory 3 is generally achieved through the memory control unit 2.
The related art multi-processor system is connected to the other system resources 4 such as an input/output device as well as the above-described basic resources, in order to perform specific assigned functions. If 32 bit address and a 64 bit data bus is provided, all devices such as the processor 5, the memory controller 2 and the other system resources 4 should have same standard interface as that of the processor bus. Further, the cache memory 1 of each processor 5 has a configuration based on the standard interface.
Each of the processors 5 has 32 kilobytes (KB) level 1 instruction cache memory and 32 KB level 1 data cache memory inside and a 1 megabyte (MB) level 2 cache memory in back side outside.
An exemplary structure of a level 1 data cache memory will be described with reference to FIG. 2. The level 1 data cache memory includes tag RAM (Random Access Memory) and Data RAM. The level 1 data cache memory implements 8-way set-associative mapping. Each of the 8 cache blocks includes 4 words (W0˜W3, respectively 64 bits) and address tag (20 bits) corresponding to the 4 words. Further, each cache block has 3 state information bits for indicating state of each of the cache blocks, that is, valid state bit, modified state bit and shared state bit.
Further, the level 1 instruction cache memory has the same configuration as that of the level 1 data cache memory. However, the level 1 instruction cache memory has only a valid state bit as a state information bit. Further, the level 2 cache memory stores data and instructions in data RAM and adopts direct mapping
In the related art, in order to increase efficiency of cache, the level 1 cache memory and the level 2 cache memory adopts write back type as write policy. However, problems relevant to memory coherence between processors and between a processor and an input/output device can be caused because of the write policy. To manage this, a cache controller unit of each processor 5 uses modified/exclusive/shared/invalid (MESI) protocol. FIG. 3 illustrates a MESI protocol.
As illustrated in FIG. 3, a state of the MESI protocol includes a modified state, an exclusive state, a shared state and an invalid state, and the state may be expressed by combining state information bits of each cache block. The modified state, the exclusive state and the shared state are examples of a valid state. Cache flush algorithm is performed especially in the modified state and the exclusive state among valid states.
For example, state information bits of the invalid state are as follows: the valid state bit (V) is 0; the modified state bit (M) is 0; and the shared state bit (S) is 0. State information bits of the modified state are as follows: the valid state bit (V) is 1; the modified state bit (M) is 1; and the shared state bit (S) is 0. State information bits of the shared state are as follows: the valid stated bit (V) is 1; the modified state bit (M) is 0; and the shared state bit (S) is 1. State information bits of the exclusive state are as follows: the valid state bit (V) is 1; the modified state bit (M) is 0; and the shared state bit (S) is 0.
Cache memory 1 that has been separately managed by each processor according to the MESI protocol maintains memory coherency by performing a cache flush algorithm that writes a cache block in the modified state (i.e., the dirty state) in the main memory again when a certain event of multi-processor system occurs. The procedure is if the certain event happens, each of the processors 5 performs an exception routine associated with the certain event. The cache flush algorithm is performed at an appropriate moment in the middle of the exception routine. By loading a continuous memory area amounting to two times of level 2 cache memory size, the cache flush algorithm is performed for modified cache block of level 1 data cache memory and level 2 cache memory.
An event that needs cache flush is generally emergent and therefore the process of the event needs prompt attention. However, because all processors connected to the processor bus perform memory reads as large as the level 2 cache memory size at the same time, loads of processor bus increase unnecessarily. Further, practical cache flush algorithm is performed within a time period after the certain events happen because the cache flush algorithm is performed by an exception routine of each processor. Thus, there can be a problem in that cache flush algorithm cannot be performed promptly.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and advantages of the invention may be realized and attained as particularly pointed out in the appended claims.