The present invention relates to a cache coherency controller of a cache memory, and more particularly to a cache coherency controller of a cache memory for maintaining data anti-dependence when a plurality of threads having sequential orders are executed in parallel.
In order to exploit parallelism in a problem, a multi-thread execution method for dividing a single sequential program into a plurality of instruction streams (referred to as "threads", hereinafter) and executing these threads in parallel has been proposed.
In this multi-thread execution method, threads are generated by a fork operation. A (parent) thread which performs a fork operation is called a "preceding thread" and a newly generated (child) thread is called a "succeeding thread". Threads are eliminated after performing a prescribed operation in a multi-thread program. In other words, the generation and the elimination of threads are repeated. Each thread is allocated to a processor. In a system physically having a plurality of processors, a plurality of threads are simultaneously executed. By allocating a plurality of threads to each processor, delaying can be concealed by starting another thread when one thread is placed in a "waiting" state (e.g., caused by a synchronizing miss, resource contention, or a cache miss) and accordingly the utilization efficiency of resources can be increased.
If a sequential program is divided into a plurality of threads which have sequential execution order, there is a possibility that a preceding thread may read an erroneous value when a succeeding thread writes a future value for the same address before the preceding thread reads data. Such a relationship is called "data anti-dependence". In order to deal with such data anti-dependence, conventionally, the reading of an erroneous value has been prevented by storing information regarding all load or store operations beforehand and performing controlling so as to prevent the stored data of a succeeding thread from being used for the loading of a preceding thread.
However, in the prior art, since it is necessary to store the write address of the succeeding thread beforehand and make a comparison between the succeeding thread and the preceding thread, exclusively used and complex hardware must be prepared.
In addition, since the numbers of addresses and data to be stored differ depending on the characteristics of executed problems, hardware is useless in a problem in which the number of times of accessing to a main memory is small. In a problem in which the number of times of accessing to the main memory is large, the number of entries for registering addresses/data becomes short and consequently parallel execution is limited.