In a typical processor system, one or more applications are running (i.e., being executed by the processor). As known in the art, the code of an application can be divided into a plurality of processes and each process can be divided into a plurality of threads. Thus a thread can be a series of instructions that are executed by the processor to achieve a given task (e.g., a subroutine). Processors are often switching between threads of a process and between processes of one or more applications (e.g., in a multi-tasking environment), but these single-threaded processors, as they are currently known in the art, are only capable of supporting one thread of execution at time. In other words, single-threaded processors cannot execute instructions over two or more threads simultaneously, and the use of multiple threads requires that these processors continuously switch back and forth between threads. However, recent advancements in processor technology have allowed the development of multithreaded processors that can support two or more threads of execution simultaneously.
Before the development of simultaneous multithreading, computer architects could further improve the performance of their machines by measuring and monitoring the various parameters that affect the performance of the processor. For example, by measuring the system performance of the machine when it executes its intended applications, the computer architect is better assisted in his or her effort to design a balanced computer system. System performance monitoring is typically accomplished with the use of on-chip performance registers which can monitor certain processor events that can characterize processor performance. For example, in several models of the Intel Pentium® processor, the following performance registers are provided on-chip: a 64-bit Time Stamp Counter (TSC), two programmable event counters (CTR0, CTR1), and a control and event select register (CESR). The CESR can be programmed to allow the event counters (CTR0, CTR1) to count the occurrence of specific events or to count clock signals while an event condition is present or absent. For example, by placing the appropriate date values into the CESR, the first counter, CTR0, can be set up to count the number of times a data read operation is performed by the processor. Once CTR0 is set up to perform this task, each time the processor performs a data read operation, CTR0 increments its internal count. Similarly, the CESR can be programmed to allow the second counter, CTR1, to simultaneously count a different event. The event counts that are ultimately stored in the registers of event counters (CTR0, CTR1) can be accessed by a user in order detect events that characterize a processor's performance. There are numerous events that can be monitored using this system such as data cache read/write misses, loading of a segment registers, etc.
The performance monitoring system described above is useful to software programmers. For example, the performance monitoring system can detect events that tend to indicate inefficiencies in the design of software applications. In addition, processor designers and computer architects can also benefit since the system allows them to observe how software applications will execute on the processor. Therefore, hardware designs can be optimized to deliver the best performance for the execution of common software (e.g., operating systems).
A drawback of the aforementioned performance monitoring system is that it primarily focuses on the operation of the processor without consideration as to which thread, of a multithreaded processor, is being executed. For example, in a multimedia application that combines both audio processes and video processes, the user could use the foregoing system to determine a greater than normal number of data cache read/write misses have occurred during the execution of the application. Using techniques currently known in the art alone, however, the user would not be able to determine which individual threads of execution, e.g., those contained in the audio or video processes, were contributing to the number of data cache read/write misses. This limitation is even more problematic in multithreaded processors, wherein threads are executed simultaneously, because keeping track of when a processor switches between threads will not be sufficient to determine precisely at which thread an event has occurred. Ultimately, if a particular event that is being monitored is adversely affecting the operation of an application, it would be advantageous to determine from which thread and at what privilege level the event is occurring.
What is needed then is a method and apparatus for detecting events that are generated by a specific thread, or set of threads, of a multithreaded processor. As will be seen, the present invention can determine if certain events are generated from an individual thread or from a series of threads executing simultaneously. In general, the present invention can accomplish this by combining event qualification by thread ID with event qualification by thread current privilege level (CPL).