1. Field of the Invention
The present, invention relates to a data processor, specifically, to a data processor including a pipeline processing mechanism which processes a jump instruction rapidly, and more particularly, to a data processor capable of reducing overheads of pipeline processing of the case where the jump instruct ion is executed by performing jump processing in the initial pipeline stage.
2. Description of the Related Art
In a conventional data processor, by dividing the processing into a plural number of steps with a flow of data processing, and processing the steps of different instructions simultaneously in respective corresponding stages of the pipeline, a mean processing time necessary for one instruction is shortened and a processing performance is improved as a whole.
However, in case of executing an instruction which disturbs an instruction processing sequence such as a jump instruction, since an instruction processing sequence is switched at executing stages of the instruction, an overhead of the pipeline processing increases and a pipeline processing can not be performed efficiently. Besides, a frequency of appearance of the jump instruction in executing practical programs is very high, thus an increase in processing speed of the jump instruction is one of the most important items to improve the performance of the data processor.
In order to improve performance of a data processor, various kinds of devices have been made about reducing overhead in performing instructions such as unconditional branch instructions, conditional branch instructions and the like.
For example, FIG. 1 is a block diagram showing a configuration for performing jump processing in the stage of instruction decoding of the invention of Japanese Patent Application No. 4-22695 as an example of a conventional data processor.
In FIG. 1, reference numeral 401 designates a branch target address calculation unit, 402 a PC calculation unit, 403 an instruction decoding unit, 404 an instruction fetch unit, respectively.
These blocks are connect with each other by an II bus 433 for transferring instruction codes, a DISP bus 434 for transferring a branch displacement or an immediate cut out according to the instruction decoding result, an ILEN bus 435 for transferring a length of an instruction having been decoded, a PI bus 431 for transferring a PC value of an instruction being decoded, a JA bus 436 for transferring a jump target address, and the like, and data are given or received between them.
By adopting a configuration shown in FIG. 1, when a jump target address is designated by PC relative address or an absolute address, the jump target address is generated in parallel with the instruction decoding processing so that the jump target instruction can be fetched immediately after the instruction decoding.
The instruction fetch unit 404 is provided with an address translation mechanism of an instruction address, a built-in instruction cache, an instruction MMU (Memory Management Unit)/cache unit including an instruction TLB (Translation Lookaside Buffer) and its control unit, and with an instruction queue unit. The instruction fetch unit 404 fetches an instruction code from the built-in instruction cache or an external memory by address conversion of an instruction fetch address, and sends the instruction code to the instruction decoding unit 403 and the branch target address calculation unit 401 through the II bus 433.
The instruction decoding unit 403 decodes instruction codes taken from the II bus 433 in 16-bit (half word) unit. The instruction decoding unit 403 decodes instruction codes outputted from the instruction fetch unit 404 by zero to eight bytes per one clock.
The information as to a PC calculation is outputted to the PC calculation unit 402, a signal for controlling a branching when a branch instruction is decoded to the branch target address calculation unit 401 or to the instruction fetch unit 404, and an output pointer update information of the instruction queue to the instruction fetch unit 404, respectively.
The PC calculation unit 402 is controlled by hardwired and calculates the PC value of an instruction by the information as to the PC calculation outputted from the instruction decoding unit 403. The instruction to be processed is the variable length instruction, and the instruction length does not become clear until the instruction is decoded. The PC calculation unit 402 calculates the PC value of the next instruction by adding the instruction length outputted from the instruction decoding unit 403 to the PC value of the instruction being decoded.
At the PC calculation unit 402, a DPC 79 holds a head address of an instruction being decoded. And a TPC 77 holds a head address of a code to be decoded including also a case where an instruction code of one instruction is divided into plural processing units and processed. That is, a head address of the code taken into the instruction decoding unit 403 from the II bus 433 is hold.
A PC adder 73 adds an instruction length taken from the instruction decoding unit 403 through the ILEN bus 435 and a value of the TPC 77 to each other, and the addition result is rewritten in the TPC 77 and an ATPC 78. When decoding of one instruction is finished, since the addition result shows a head address of the next instruction, it is also rewritten in the DPC 79. In addition, the ATPC 78 is referred to as a PC value of the next instruction at the time of calculating an address.
When a jump is generated, the jump target address is taken in from the JA bus 436, and the TPC 77 and the DPC 79 ape initialized, but the ATPC 78 is not initialized even when a jump is generated.
By such a control, a PC value of an instruction being decoded is always held in the DPC 79.
The branch target address calculation unit 401 calculates a branch target address and cuts out an absolute address field in an instruction decoding stage. Data of a position of a branch displacement is cutout from an instruction code on the II bus 433 which is sent to the instruction decoding unit 403 from the instruction fetch unit 404, and it is added to a value of the DPC 79 being a PC value of an instruction to be decoded, thereby a branch target address is calculated.
For high frequency used branch instructions, four adders, that is, first to fourth adders A1, A2, A3, A4 are provided so that the jump target address can be calculated at the same time of an instruction decoding.
In addition, reference numerals 409 to 416 designate input latches of the respective adders A1, A2, A3, A4, and numerals 417 to 420 designate output latches of the respective adders A1, A2, A3, A4.
The branch target address calculation unit 401 operates in synchronism with the decoding process of an instruction code. In synchronism with the decoding cycles of respective instructions, a value of the DPC 79 is taken in by the input latches 413, 414, 415, 416 through the PI bus 431. Data on the II bus 433 is sign-extended by the four input latches 409, 410, 411, 412 corresponding to the four positions of branch displacement fields and taken therein.
The branch displacement fields taken in the input latches 411, 412 correspond to absolute address fields in the case where jump target addresses of jump instructions are designated by absolute addresses. In the latch 421, the sign-extended data taken in the input latch 411 is held. And in the latch 422, the data taken in the input latch 412 is held intact.
In the case where a branch instruction by which a branch target address can be calculated or a jump instruction by which a jump target address is designated by an absolute address, is decoded in parallel with decoding an instruction at the first to fourth adders A1, A2, A3, A4, a jump processing or a prefetch of a branch target instruction is performed in that cycle.
When branch target addresses cannot be calculated at the first to fourth adders A1, A2, A3, A4 during a branch instruction decoding cycle, a branch target address is calculated in the next cycle, and a branch processing or prefetching of a branch target instruction is performed. In this case, after decoding a branch instruction, the input latch 412 takes in a branch displacement cut out according to the instruction decoding result through the DISP bus 43 from the instruction decoding unit 403. The input latch 416 takes in a value of the DPC 79 through the PI bus 431. The fourth adder A4 adds the contents of the input latches 412 and 416 to each other and outputs the result to the output latch 420.
When a jump processing is performed in an instruction decoding stage, one of the output latches 417 to 422 is selected by the control signal inputted from the instruction decoding unit 403 and its content is outputted to the JA bus 436, and a jump target address is transferred to the instruction fetch unit 404. The instruction fetch unit 404 starts to fetch a jump target instruction according to the address taken in from the JA bus 436. And a value on the JA bus 436 is taken also in the TPC 77 and the DPC 79 and the PC calculation unit 402 is initialized.
In such a way, the conventional data processor is capable of fetching a branch target instruction immediately after the instruction decoding, by calculating a branch target address of a branch instruction by which a branch target address is designated by PC relative or by cutting out an absolute address field of a jump instruction by which a jump target address is designated by an absolute address in parallel with decoding an instruction. Accordingly, by providing an aforementioned jump processing mechanism, a data processor of high performance can be obtained since a wasteful time until decoding a branch target instruction after decoding a branch instruction can be reduced to a time required only for fetching a branch target instruction.
By the way, in the aforementioned conventional data processor, the same number of adders as the number of branch displacement fields of branch instructions by which branch target addresses are calculated in parallel with decoding instructions are required, and the problem was that a quantity of hardware is increased in proportion to the increase of the number of instructions being an object to be processed at high speed.
And there was also a problem as to an instruction by which a jump target address is designated by an absolute address, that a quantity of hardware increases since latches are necessary by the number of fields which are cut out.