Conventionally, an operating system on a computer system manages the hardware and software resources of the computer system. The operating system controls access to each processor resident in the computer system and schedules their use by applications and other processes running on the computer system that require processor time. Generally, in order to be useful, it is required that this processor management function of the operating system divide the processor time available among the processes and applications running on the computer so that processor power is used effectively, and so the processes and applications receive enough processor time to function properly. However, conventionally, the operating system has limited information about processor topology and processor resources to schedule such tasks.
On many operating systems, multi-threading is used to increase overall system performance. Applications and other processes may utilize multiple streams of instructions (threads) in execution. In a multi-threading environment, multiple threads can operate in parallel on a single physical processor. The operating system schedules the execution of threads on the available processor(s).
Each processor includes a variety of resources. These resources include storage for information about a given thread being run, including information generated and used by the thread. This storage typically includes general purpose registers, control registers, APIC (advanced programmable interrupt controller) registers and some machine state registers. This information is called the architectural state of the processor.
Additionally, each processor includes processor execution resources used to execute the thread. These resources include caches, execution units, branch predictors, control logic, and buses. A complex instruction repository, such as Microcode may also be included in the processor execution resources.
When more than one thread is running simultaneously, the operating system allots a certain number of processor execution cycles to a first thread. When the cycles have been used, the architectural state of the processor on which the thread has been running is saved, and the architectural state for the next thread to be allotted cycles is loaded.
The standard processor architecture includes storage for only one architecture state along with only one set of processor execution resources on a single chip. Other processor architecture techniques have been developed in recent years to attempt to improve processor performance. One such architecture technique is chip multiprocessing (CMP). CMP places two or more logical processors on a single chip. The two or more logical processors each have storage for one architecture state. A separate set of processor execution resources is also included for each logical processor in CMP. However, all the processors on one chip may share a large on-chip cache. This shared cache allows some efficiency when information stored in the cache while a thread is executing on one processor is required by the other processor. Cache misses may be reduced, and in this way, efficiency increased.
Another processor architecture utilizes symmetric multithreading technology (SMT) architecture. A SMT technology architecture makes a single physical processor chip appear to the operating system as multiple logical processors. To do this, there is one copy of the architecture state on the chip for each logical processor. However, all the logical processors on the chip share a single set of physical processor execution resources. Operating systems and user programs can schedule threads to logical processors as they would on conventional physical processors in a multi-processor system. Thus, a first logical processor may be running a first thread while a second logical processor may be running a second thread. Because these two threads share one set of processor execution resources, the second thread can use resources that would be otherwise idle if only one thread was executing. The result is an increased utilization of the execution resources within each physical processor package. For example, if a floating-point operation uses different parts of the physical processor execution resources than an addition and load operation, then one logical processor can perform a floating-point operation while the other performs an addition and load operation. Because the threads running on the logical processors may occasionally need the same part of the physical processor execution resources, this may occasionally require one thread to wait for the resource to be free. However, while sometimes less efficient than two separate physical processors, this solution may be more efficient than the traditional architecture in which one logical processor was implemented in one physical processor.
Other architectures are conceivable containing multiple logical processors, each with its own architecture state, but with some sharing of physical processor execution resources and some duplication of physical processor execution resources for each logical processor.
Each logical processor may be scheduled just as traditional processors are scheduled. However, conventionally, there is no way for a thread manager (the processor scheduling function within an operating system) to distinguish between (1) a logical processor sharing all processor execution resources with one or more other logical processors, (2) a logical processor sharing a cache with another processor on one chip, and (3) a conventional logical processor, with its own processor execution resources, unshared with any other processor. Generally, there is no way for the thread manager to determine which, if any, processor execution resources two logical processors share. This may result in less-than-optimal scheduling.
For example, where four logical processors, A, B, C, and D, are available, where A and B are hyper-threaded and share processor execution resources, and where C and D share a cache, no architectural details for the processors will be apparent to the thread manager. Where two threads are being scheduled, scheduling the two threads for A and B may be less optimal than scheduling one for A and one for C, since A and B share processor execution resources and there may be some contention. And where, among other threads, two threads are running which may benefit from sharing a cache, the thread manager will not be aware that C and D share a cache, and may not schedule the two threads to C and D.
Conventionally, the only way for thread manager to know that logical processors are hyper-threaded and share processor execution resources is by performing the CPUID instruction which causes the processor to provide identification information, such as model, version, speed, vendor/manufacturer string. This identification information can be compared to stored information within the operating system to determine that a certain chip contains two logical processors but that they share all processor execution resources. The thread manager may then attempt to optimize for the symmetric multi-threading.
However, while a thread manager may be able to identify processor types for which information is currently available, when new processor types become available the operating system needs to be revised. If it is not revised, it does not have information about the processors in the computer system and can not schedule them efficiently. Unrevised operating systems are not forward-compatible, but are forced to use old processor descriptions to handle new processor types, even though they may not be correct descriptions. Additionally, while an operating system may be able to identify and deal with SMT situations where all processor execution resources are shared, future processors may implement other methods for sharing processing resources. Flexibility in identifying to the operating system new processor resource sharing schemas is necessary which is not found in current mechanisms for identifying processors.
Additionally, current thread manager systems can not adequately handle mixed-instruction set environments. Processors may use various instruction sets, such as the x86 instruction set or the IA64 instruction set. Graphics processors may use their own unique instruction set. While multi-processor systems today generally use processors with one instruction set, future configurations may more frequently include processors with different instruction sets. However, thread manager systems are not currently able to easily accommodate mixed instruction set environments and schedule threads requiring processors operating with a specific instruction set.
Thus, there is a need for a flexible method for presenting processor resource information to the thread manager for use in optimizing the scheduling of tasks onto logical processors.