1. Field of Invention
The present invention pertains to the field of computer systems. More particularly, this invention relates to thread management in a multi-threaded processor.
2. Description of the Related Art
Computer systems typically include a memory and a processor. The memory generally includes a main memory and a cache memory for storing data and instructions for the processor. Typically, the processor retrieves instructions from the memory, reads the data associated with the instructions from the memory, executes the instructions, and writes the resulting data back into the memory. The processor can be a parallel processing system using several processors, i.e., multi-processor system, to enhance the data throughput.
Accessing data and instructions from the main memory can result in periods of large latency. The latency of the memory is the delay from when the processor first requests a word from memory until that word arrives and is available for use by the processor. The latency of a memory is one attribute of performance. Accordingly, many computer systems have one or more cache memories attached to each processor to reduce the latency. For example, the computer system may include a primary cache, also known as a level one (L1) cache, which is usually tightly integrated with the processor and may be contained on the same integrated circuit as the processor. The computer system may also include a secondary cache, also known as a level two (L2) cache, which usually is located between the primary cache and the main memory in the memory hierarchy.
The cache memories store blocks of data and/or instructions that are received from the main memory. The blocks of data and/or instructions stored in the memory are generally referred to as cache lines or data lines. The cache memories usually provide the processor with relatively fast access to the data lines they contain as compared to the access time required to obtain the same data lines from the main memory.
The processor accesses information, e.g., a particular data line, in the cache memory by transmitting an address corresponding to the information to the cache memory. The cache memory searches for the address in its memory to determine whether the information is contained therein. If the information is not found in the cache memory, a cache miss occurs. When a cache miss occurs, the address is transmitted to the next level of the memory, e.g., a secondary cache memory or if one is not present, to the main memory. If a secondary cache memory is present, a search is performed on the secondary cache memory for the information; otherwise, a search operation is performed on the main memory. Once the information has been located, the information is transmitted from the main memory or the secondary cache memory to the primary cache memory. This process is referred to a cache fill operation and the information may replace other information stored in the primary cache memory.
The processor used in the computer system also can be a multi-threaded processor, which switches execution among multiple threads. A thread may be defined as a stream of addresses associated with the data and/or instructions of a particular sequence of code that has been scheduled within the processor.
One advantage of a multi-threaded processor is that it can switch threads and continue instruction execution during a long latency operation such as a cache fill operation. This usually provides an overall increase in throughput particularly when a missing data line must be obtained from the main memory.
Nevertheless, conditions might sometimes exist in the computer system having a multi-threaded processor that cause the primary cache to perform more poorly as a result of the additional demands placed on it by the additional threads. Caches have finite capacity and when one thread""s data is forced out of a cache to make room for another thread""s data, cache pollution occurs and the overall performance of the processor may be decreased.
It should therefore be appreciated that there remains a need for a computer system that improves the efficiency of thread scheduling. The present invention fulfills this need.
The present invention is embodied in a multi-thread processing system, and related method, that provides a multi-thread processor with information from a cache memory to control the scheduling of threads. The cache memory in the multi-thread processor may include one or more caches that may be xe2x80x9csplitxe2x80x9d, i.e. separate caches for instruction and data addresses, or xe2x80x9cunifiedxe2x80x9d, i.e., a single cache which may contain both instructions as well as data lines, or a combination of the two.
The multi-thread processing system includes a multi-thread processor, a cache memory, and a thread scheduler. Initially, an address that identifies a data line to be retrieved is transmitted from the multi-thread processor to the cache memory. The cache memory performs a lookup operation to locate the address. After the address is located, a data line corresponding to the address is retrieved from the cache memory. The data line contains data and information pertaining to the data which is sent from the cache memory to the thread scheduler. The thread scheduler determines a figure of merit from the data line information for each of a plurality of threads.
The figure of merit is used by the thread scheduler to determine which thread to execute, i.e., switch to, when the current or previous thread performs a long latency operation. The figure of merit defines the execution environment as measured by the performance of the cache memory. For example, the figure of merit can be determined using one of the following criteria: the number of data lines owned by a particular thread in the cache memory, the number of times a particular thread has hit in the cache over a specified time interval, or the thread that installed the most lines into the cache memory over a specified interval. Threads having the largest figure of merit are using the processor""s resources more efficiently and should be selected by the thread scheduler to execute. Accordingly, the efficiency of thread scheduling is enhanced by providing the multi-thread processor with feedback on the current execution environment as measured by the cache memory.
Other features and advantages of the present invention will be apparent from the detailed description that follows.