1. Technical Field
The present invention relates in general to an improved data processing system and in particular to an improved high performance multithread data processing system. Still more particularly the present invention relates to a method and system for reducing the impact of memory latency in a multithread data processing system.
2. Description of the Related Art
Single tasking operating systems have been available for many years within computer systems. In such systems, a computer processor executes computer programs or program subroutines serially, that is no computer program or program subroutine can begin to execute until the previous computer program or program subroutine has terminated. This type of operating system does not make optimum use of the computer processor in a case where an executing computer program or subroutine must await the occurrence of an external event (such as the availability of data or a resource) because processor time is wasted.
This problem has lead to the advent of operating systems. Each of the program threads performs a specific task. While a computer processor can execute only one program thread at a time, if the thread being executed must wait for the occurrence of an external event, i.e., the thread becomes "non-dispatchable," execution of a non-dispatchable thread is suspended and the computer processor executes another thread of the same or different computer program to optimize utilization of processor assets. Multitasking operating systems have also been extended to multiprocessor environments where threads of the same or different programs can execute in parallel on different computer processors. While such multitasking operating systems optimize the use of one or more processors, they do not permit the application program developer to adequately influence the scheduling of the execution of threads.
Previously developed hardware multithread processors which maintain multiple states of different programs and permit the ability to switch between those states quickly typically switch threads at every memory reference, cache miss or stall. Memory latencies in modern microprocessors are too long and first level on-chip cache sizes are generally quite small. For example, in an object-oriented programming environment program locality is worse than in traditional environments. Such a situation results in increased delays due to increased memory access rendering the data processing system less cost-effective.
Existing multithreading techniques describe switching threads on a cache miss or a memory reference. A primary example of this technique may be reviewed in "Sparcle: An Evolutionary Design for Large-Scale Multiprocessors," IEEE Micro Volume 13, No.3, pp. 48-60, June 1993. As applied in a so-called "RISC" (reduced instructions set computing) architecture multiple register sets normally utilized to support function calls are modified to maintain multiple threads. Eight overlapping register windows are modified to become four non-overlapping register sets, wherein each register set is a reserve for trap and message handling. This system discloses a thread switch which occurs on each first level cache miss that results in a remote memory request.
While this system represents an advance in the art, modern processor designs often utilize a multiple level cache or high speed memory which is attached to the processor. The processor system utilizes some well-known algorithm to decide what portion of its main memory store will be loaded within each level of cache and thus, each time a memory reference occurs which is not present within the first level of cache the processor must attempt to obtain that memory reference from a second or higher level of cache.
It should thus be apparent that a need exists for an improved data processing system which can reduce delays due to memory latency in a multilevel cache system utilized in conjunction with a multithread data processing system.