The present invention relates to execution of computer instructions, and, more particularly, the present invention relates to the efficient execution of loop operations using multiple sub-processors.
Computer systems are used to perform a variety of tasks in a wide range of applications. Often, a microprocessor controls operation of the computer system. The microprocessor may be programmed to handle specific operations in a particular manner. Typically, the microprocessor fetches or otherwise receives an instruction. The instruction tells the microprocessor to perform an operation, such as adding data, jumping to a different part of a program, or performing a logic operation. Performing the operation may include one or more steps of decoding the instruction, calculating an address in memory, accessing/reading data from memory, executing the instruction employing the data and writing a result into memory.
Certain applications, such as computer graphics, are computationally intensive, and may require performing many instructions to effectively render and display an image on a display device. One particular computationally intensive computer graphics application is three-dimensional graphics processing for polygon/pixel shading (e.g., graphics shading). For example, a video game display may show a ball thrown from one player to another player. The ball and other objects on the display may be represented as a series of polygons. As the ball travels across the display, the shading of the ball and objects covered by its shadow can change. The microprocessor may compute and/or re-compute shading for each picture element (“PEL”) of each polygon that the ball and shadow intersect. Such computations may include multiple iterations (i.e., loops) and millions of calculations.
A drawback for a single microprocessor handing all of the instructions and calculations for a particular operation, such as in the graphics shading example above, is time. Typically, the more instructions that are performed, the longer the overall computation takes. One method to handle such computationally intensive applications is for the microprocessor to break up a task and distribute portions of the task among one or more sub-processors. The task may be one or more instructions, or it may be one or more segments of a single instruction. Spreading the task among sub-processors reduces the time to complete the task. Other benefits include higher data throughput and improved system reliability. Also, because the sub-processors repeatedly perform identical or similar operations, the sub-processors may be tailored to efficiently perform those operations (e.g., perform a subset of instructions).
In one method of distributing tasks, the main processor sends an instruction to a group of sub-processors in parallel. In another method of distributing tasks among sub-processors, the main processor sends instructions to a series of sub-processors sequentially. Unfortunately, these methods have drawbacks associated with them.
One problem with distributing portions of a task among sub-processors is the possibility of a sequencing error, wherein some portions of the task are processed out of order, thereby generating incorrect data. Parallel sub-processing may be particularly susceptible to sequencing errors. Another problem is the need for the microprocessor, or main processor, to keep track of and control the operation of the sub-processors and shared resources (e.g., the data/address bus). Yet another problem is scalability. Certain computer architectures may be able to handle only a few sub-processors whereas other architectures may be able to handle any number of sub-processors. Furthermore, because sequential sub-processors receive tasks one at a time, sequential sub-processing takes more time than parallel sub-processing.
Various techniques have been developed and employed to alleviate such problems. For example, one computer architecture includes a main processor and multiple dedicated sub-processors that are hard-wired in a desired configuration. While such a computer architecture may reduce computation time, the hard-wired configuration is inflexible and may inefficiently employ scarce computing resources. Specifically, the hard-wired sub-processors may include a fixed data flow with on/off switching functions and programmable parameters. However, such functionality does not provide adequate flexibility to perform computationally intensive applications, such as the graphics shading example described above. Therefore, alternative architectures are desired to adequately address the aforementioned problems.