1. Field of the Invention
This invention relates in general to the field of instruction execution in computers, and more particularly to an apparatus and method for combining two micro instructions for parallel execution within a single channel of a pipeline microprocessor.
2. Description of the Related Art
A present day pipeline microprocessor has a single path, or channel, or pipeline, that is divided into stages that perform specific tasks. Each of the specific tasks are part of an overall operation that is directed by a programmed instruction. Each of the programmed instructions, or macro instructions, in a software application program is executed in sequence by the microprocessor. As a macro instruction enters the first stage of the pipeline, certain tasks are accomplished. The instruction is then passed to subsequent stages for accomplishment of subsequent tasks. Following completion of a final task, the instruction completes execution and exits the pipeline. Execution of programmed instructions by a pipeline microprocessor is very much analogous to the manufacture of items on an assembly line.
One of the obvious aspects of any assembly line is that there are multiple items resident in the line in successive stages of assembly during any given point in time. The same is true for a pipeline microprocessor. During any cycle of a pipeline clock signal, there are multiple instructions present in the various stages, with each of the instructions being at successive levels of completion.
As with assembly lines, pipeline microprocessors can exhibit tremendous efficiency, or they can manifest problems. It goes undisputed that one of the main objectives of microprocessor design is to have instructions move through the pipeline to completion as quickly as possible. To an end user, efficient throughput means faster displays, quicker connections, and more recently, intelligible speech and digital video presentation that more closely represent reality.
The present inventors have noted that prevalent desktop computer application programs exhibit throughput bottlenecks when they are executed by present day microprocessors. The present inventors have identified three limitations of present day microprocessors that each contribute to this throughput problem: 1) the continued use of combined logic within a single stage to perform either a memory access operation or an integer arithmetic/logic operation; 2) insufficient logic resources to perform single-instruction multiple-data (SIMD) operations; and 3) inefficient utilization of logic resources within successive stages of a single-channel microprocessor pipeline.
With regard to the first limitation, early microprocessors had an integer execution stage that was dedicated to performing either a memory access (i.e., a load or a store of operands from memory) or an arithmetic/logical function. And because these early microprocessors executed integer operations only, the combined memory/arithmetic stage was not a problem. However, in just a few years, a new set of operations, floating point operations, were introduced into the realm of desktop computing. Because of the distinct nature and complexity of floating point data and associated floating point operations, it was required that dedicated floating point execution logic be incorporated into the microprocessor architecture. But since application programs still performed more integer operations than floating point operations, microprocessor designers chose to utilize existing capabilities of integer execution logic to load floating point data from memory while adding only the logic necessary to perform the floating point operations. Hence, to perform a floating point operation on a floating point operand located in memory required the execution of two instructions through the single instruction channel of the pipeline microprocessor: a first instruction, executed by the integer execution unit, to load the floating point operand from memory, and a second instruction, executed by the floating point execution logic, to perform the prescribed floating point operation on the floating point operand.
In more recent years, another new set of operations, single-instruction multiple-data (SIMD) operations, have been introduced into domain of desktop computing. SIMD instructions perform the same operation, say addition, on multiple data items at the same time. Such a feature is very useful for video and speech processing applications. And like what was done when floating point operations were introduced, to take advantage of existing memory access logic in the integer execute stage, only the logic necessary to perform SIMD operations has been added to the already existing single-channel pipeline architecture. Hence, to perform a SIMD operation on a SIMD operand located in memory also requires the execution of two instructions through the single instruction channel: a first instruction, executed by the integer execution unit, to load the SIMD operand from memory, and a second instruction, executed by the SIMD execution logic, to perform the prescribed SIMD operation on the SIMD operand.
The combined memory access/arithmetic structure of a present day integer execution unit presents a serious bottleneck to the execution of floating point, SIMD, and integer instructions that require data to be first loaded from memory. As is described above, two instructions are required to perform an operation on an operand in memory: one instruction to load the operand, and another instruction to perform the prescribed operation.
With regard to the second limiting factor, the present inventors have also noted increased use of SIMD instructions in more recent application programs. Because large numbers of SIMD instructions are now being used, the capacity of existing SIMD execution logic to handle the large numbers of SIMD instructions has become an issue of concern.
Finally, a third limitation of present day microprocessors regards the inefficient use of logic resources in various stages of a single-channel pipeline, as seen when executing back-to-back instructions. For example, a first instruction may utilize logic resources in, say, even numbered stages of the instruction pipeline, while a following instruction may employ logic resources in, say, odd numbered stages. The two instructions could conceivably be executed concurrently in many cases, in half the time, thus making more efficient use of stages within the instruction execution channel of the pipeline microprocessor.
Therefore, what is needed is an apparatus in a pipeline microprocessor that allows instructions to be combined within a single channel so that application programs can execute faster than has heretofore been achievable.
In addition, what is needed is a pipeline microprocessor architecture that does not require memory access operations to be performed by the same pipeline stage that performs integer arithmetic/logic operations.
Furthermore, what is needed is a microprocessor that can execute large numbers of SIMD instructions without exhibiting present day timing delays.
Moreover, what is needed is a method for combining instructions within a single channel of a pipeline microprocessor in a manner that increases the execution speed of prevalent present day application programs without substantially increasing the cost of the pipeline microprocessor.