This application claims priority from Japanese Patent Application Reference No. 11-188372, filed Jul. 2, 1999.
The present invention relates to a pipelining data processing device. More specifically, the present invention relates to a data processing device that allows efficient execution of branch instructions.
Conventional pipeline processing techniques can provide for concurrent processing of certain computer instructions. Instructions can be processed in stages, with each stage performing certain processing of the instruction. While certain advantages are perceived with the conventional art, opportunities for greater efficiencies exist. For example, in conventional technologies, pipeline processing delays are often encountered. For instance, when a branch instruction is encountered, delays in processing can occur. Further, significant resources may be required for storing the address of the branch destination instructions, for example. Resources can include buffer memory, and the like, for example.
What is needed are more efficient techniques for processing branch instructions in pipeline processing architectures.
According to the present invention, a data processing device that can perform pipeline processing, i.e., instruction decoding and instruction execution, with minimal branch destination instruction reading delays when a branch instruction is encountered, is provided. In the instruction look-ahead system of a specific embodiment according to the present invention, instruction decoding is separated into two stages. In a first instruction decoding stage, a plurality of instructions are decoded in a single machine cycle. Also, in the first instruction decoding stage, when a branch instruction is decoded a branch destination instruction for the branch instruction is read from memory. The instructions decoded in the first instruction decoding stage are stored temporarily in instruction flow registers. In a second instruction decoding stage, instructions read sequentially from the instruction flow registers are decoded.
In a representative embodiment according to the present invention, a processor is provided. The processor can comprise a first instruction decoding stage, which can be operative to fetch instructions from an instruction cache and to store the fetched instructions into a buffer, such as an instruction buffer, for example. The first decoding stage can be further operative to read a plurality of instructions from the buffer and decode the instructions; and, if a branch instruction is decoded, fetch a branch destination instruction from the instruction cache. The processor can also comprise a second instruction decoding stage, operative to decode instructions read from said buffer substantially contemporaneously with said processing in said first instruction decoding stage.
In another representative embodiment according to the present invention, a method for pipeline processing is provided. The method can comprise a variety of elements, for example, pre-fetching instructions from an instruction cache and storing the pre-fetched instructions in an instruction buffer. The method can also comprise reading a plurality of instructions from the instruction buffer in one machine cycle, for example, and decoding the instructions in a first instruction decoder; and, if a branch instruction is decoded, requesting the instruction cache a pre-fetch of a branch destination instruction. Decoding instructions read from the instruction buffer in a second instruction decoder in order to perform instruction execution can also be part of the method. Further, a number of instructions read from the instruction buffer during one machine cycle can be greater than an average number of instructions decoded by the second instruction decoder during one machine cycle.
In a yet further representative embodiment according to the present invention, a data processing system is provided. The data processing system can comprise a memory and a processor, connected with the memory. The processor can include an instruction buffer holding instructions pre-fetched from the memory; and a first instruction register storing a plurality of instructions read from the instruction buffer. Further, a first instruction decoder decoding the plurality of instructions in the first instruction register and an instruction flow register sequentially storing instructions stored in the first instruction register can also be included in the processor. Furthermore, the processor of the system can include a second instruction register storing an instruction output from the first instruction register or from the instruction flow register. Also, a second instruction decoder decoding instructions stored in the second instruction register can be part of the processor. In representative embodiments of the system, an instruction read request is issued to the memory based on an analysis result from the first instruction decoder.
Numerous benefits are achieved by way of the present invention over conventional techniques. Embodiments according to the present invention can provide improved decoding and execution delays in instructions following branch instructions when a branch instruction is encountered. Further, specific embodiments can provide more efficient processing of instruction series. These and other benefits are described throughout the present specification.