1. Technical Field
The present invention relates in general to a system and method for instruction thread distribution. In particular, the present invention relates to a system and method for directing instruction thread distribution according to the performance data of the circuitry to execute the instructions.
2. Description of the Related Art
Many modern data processing systems include multiple central processing unit cores (CPUs) in the system. These data processing systems will execute instructions of a single program across these multiple central processing unit cores. The single program includes many instructions to be executed in the central processing units. One technique to employ the multiple central processing unit cores in the execution of these instructions is to divide the instructions into groups of instructions or threads. Then each thread is directed to a central processing unit for execution. Several prior art patents address the use of instruction threads in a processor and the control of execution of these instruction threads. These patents include U.S. Pat. No. 7,093,109 entitled “Network Processor which makes Thread Execution Control Decisions Based on Latency Event Lengths”; U.S. Pat. No. 6,076,157 entitled “Method and Apparatus to Force a Thread Switch in a Multithreaded Processor”; U.S. Pat. No. 6,212,544 entitled “Altering Thread Priorities in a Multithreaded Processor”; and U.S. Pat. No. 6,625,637 entitled “Deterministic and Preemptive Thread Scheduling and Its Use in the Debugging Multithreaded Applications.”
In a multiple central processing unit data processing system, it is helpful to know the physical conditions of the central processing unit cores that will be receiving the instruction threads. To obtain the maximum performance within the data processing system, distribution of the instruction threads for execution should be made to the central processing units that are able to execute these instruction threads efficiently. One physical condition of the central processing unit cores is the performance data or frequency response that is measured in terms of clock frequency and it is inherently due to the manufacturing process. The number of CPU cores that can be implemented on a chip is proportional to the area of the chip. But as the chip area increases, each separation between CPU cores located near the opposite edges of the chip also increases. In a chip with large area, the performance of individual devices contained in cores that are not within close spatial proximity differs due to minor changes in semiconductor manufacturing process seen by distant cores. The net effect of this is that cores that are separated offer different frequency response or performance. The higher the performance data for the core, the more efficient the central processing unit will be.