1. Field of Invention
The present invention relates to a method of accessing cache memory. More particularly, the present invention relates to a method of accessing cache memory for parallel processing.
2. Description of Related Art
In the recent years, operating speed of processors has been rapidly increased because of unceasing improvements of semiconductor process technique. However, cache memory access has not been greatly speeded-up. The speed difference between processors and cache memories has become larger and larger.
Generally, the speed difference between processors and cache memories needs to be solved. For example, modern processors utilize hierarchical memory design to solve such problem.
The hierarchical memory design introduces data locality including two locality types identified as temporal locality and spatial locality to increase the speed of cache memory access, resulting in better processors performance. That is because cache memory is a type of dynamic allocation memory allocated by hardware, and program instruction execution time is highly related to cache hit rate.
For simultaneous multithreading (SMT) and chip-multi processor (CMP) processors, the execution time of one program will be influenced by another program that is simultaneously executed. Although using cache partition method could eliminate the reciprocal effect of parallel processing programs in single physical processor, this method does not allow the use of common cache, so the usage rate of the cache cannot be efficiently increased.
The abovementioned problem can be overcome by dynamically adjusting the size of a sub-cache memory of each mini-processor. However, dynamic adjustment of the sub-cache size is generally achieved by modifying replacement algorithm. Unfortunately, changing the size of a cache partition will be time consuming when only a few cache miss occurs, which results in the so-called latency. For programs that perform a quality of service (QoS) or a timing constraint task, the latency reduces the quality of service or causes deadline miss. System performance will not be improved in such circumstances.
Therefore, the locality brings three major influences.
First, the worst case execution time (WCET) will be difficult to predict because of the cache hit rate. In process design, the WCET influences seriously the prediction of operating time of the entire system. Therefore, how to correctly predict the cache hit rate becomes the biggest challenge for predicting WCET. Besides, the prediction of WCET is a fundamental work for designing embedded systems or real-time systems. If WCET is difficult to predict, software designs for those systems will be influenced.
Second, a phenomenon, so-called trashing, may happen between those parallel processing programs executed at the instruction level. The thrashing phenomenon happens when lots of cache miss occur in level one (L1) cache memory inside the processor, i.e., the cache hit rate is low. The low cache hit rate rapidly lowers the number of instructions that a processor is able to execute per second. When the working sets of different parallel processing programs refer to the same cache line of the cache memory, different programs will overwrite other's working set where thrashing occurs, resulting in lowering system performance.
Third, the difficulty in designing the scheduler of operation system has been increased. The CMP processor and the SMT processor contain simple processors or logical processors, collectively called mini-processor. Generally, the operating system supposes each mini-processor will fairly use hardware resources and will not influence with each other. If the usage of cache memory is not limited, a program executes with a memory intensive program or a CPU intensive program in parallel processing will generate different results, which increases the difficulty in designing the operating system.
Therefore, there is a need to provide an improved method of accessing cache to mitigate or obviate the aforementioned problems.