The present invention is directed, in general, to processing systems and, more specifically, to a microprocessor that is capable of processing multiple independent threads of instruction code.
The demand for faster computers demands that state-of-the-art microprocessors execute instructions in the minimum amount of time. Over the years, microprocessor speeds have been increased in a number of different ways, including increasing the speed of the clock that drives the processor, reducing the number of clock cycles required to perform a given instruction, and reducing the number of gate delays incurred while executing an instruction.
Microprocessor speeds have also been increased by means of one or more instruction pipelines. An instruction pipeline is a series of separate instruction processing stages. Each stage is independent and is optimized to perform a specific portion of the overall instruction processing. Thus, instructions may be fed into the first stage of the pipeline and each stage performs a specific portion of the instruction, much like an assembly line. Preferably it is not necessary for one instruction to finish processing before the next instruction is loaded into the pipeline. Thus, multiple instructions may be loaded into the instruction pipeline. For example, a five stage instruction pipeline may contain up to five instructions at one time.
The instruction pipeline concept has been extended even further to multiple pipeline architectures. In a multiple pipeline architecture, a complex instruction decoder feed instructions to two or more instruction pipelines. The complex instruction decoder may select a particular pipeline based on which instructions are already in each pipeline and how fast the instructions are expected to flow through the remaining pipeline stages.
However, there are limitations to the improvements that may be provided by single and multiple instruction pipelines. Going from single to multiple instruction pipelines has diminishing returns as the number of instruction pipeline grows. Branch (or xe2x80x9cchange of flowxe2x80x9d) instructions make it difficult to decode many instructions in parallel. Conditional branch instructions cause problems with pipelines because the next instruction to be loaded into the pipeline cannot be determined until after the branch is resolved. Traditional solutions to this problem generally revolve around inserting more logic to do branch predictions and then speculatively executing the predicted path until the branch is resolved. This is done to maximize processor throughput.
However, if small size and low power are important, branch prediction techniques have significant drawbacks. A large amount of high speed circuitry is required, which is expensive in both area and power consumption. In addition, speculative execution wastes power if the predicted path turns out to be wrong and the speculative execution is flushed. Furthermore, data dependencies can serialize the use of execution units. As a result, in conventional microprocessors containing, for example, four instruction pipelines, the fourth pipeline may be used less than five percent (5%) of the time in some applications.
Therefore, there is a need in the art for improved microprocessors that have a higher throughput rate. In particular, there is a need in the art for improved microprocessors that include multiple instruction pipelines. More particularly, there is a need in the art for multiple instruction pipeline microprocessors that more efficiently use the available instruction pipelines and that are less susceptible to stalls caused by branch (change-of-flow) instructions and data dependencies.
The limitations inherent in the prior art described above are overcome by the present invention which provides, for use in a pipelined processor comprising an instruction execution pipeline, an apparatus for loading instructions into the instruction execution pipeline. In an advantageous embodiment of the present invention, the apparatus for loading instructions comprises: 1) an instruction loading circuit capable of loading instructions from a first instruction thread into the instruction execution pipeline; and 2) a branch instruction detection circuit capable of detecting a branch instruction in the first instruction thread and, in response to the detection, causing the instruction loading circuit to stop loading instructions from the first instruction thread into the instruction execution pipeline and causing the instruction loading circuit to begin loading instructions from a second instruction thread into the instruction execution pipeline.
The present invention takes advantage of the fact that two separate threads of code are normally running in a data processing system. Thus, instead of building the extra circuitry needed to predict a branch destination in a first thread of code, the branch instruction is resolved during normal execution and useful work is done in the meantime on the second thread of code.
In one embodiment of the present invention, the apparatus for loading instructions further comprises a first state table capable of storing first state information associated with the first instruction thread and a second state table capable of storing second state information associated with the second instruction thread.
In another embodiment of the present, invention, the instruction execution pipeline comprises a plurality of execution units capable of selecting and retrieving the first state information from the first state table and using the first state information to execute instructions in the first instruction thread.
In still another embodiment of the present invention, the plurality of execution units selects and retrieves the first state information according to at least one thread status bit associated with the instructions in the first instruction thread.
In yet another embodiment of the present invention, the instruction execution pipeline comprises a plurality of execution units capable of selecting and retrieving the second state information from the second state table and using the second state information to execute instructions in the second instruction thread.
According to a further embodiment of the present invention, the plurality of execution units selects and retrieves the second state information according to at least one thread status bit associated with the instructions in the second instruction thread.
According to a still further embodiment of the present invention, the instruction execution pipeline comprises an address generation circuit capable of selecting and retrieving addresses from the first state information and from the second state information according to at least one thread status bit associated with the instructions in the first and second instruction threads.
According to a yet further embodiment of the present invention, the branch instruction detection circuit is further capable of detecting a branch instruction in the second instruction thread and, in response to the detection, causing the instruction loading circuit to stop loading instructions from the second instruction thread into the instruction execution pipeline and causing the instruction loading circuit to begin loading instructions from the first instruction thread into the instruction execution pipeline.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION OF THE INVENTION, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.