1. Field of the Invention
The present invention relates to a data processing unit which receives input of instructions and processes the same and, more particularly, to a data processing unit characterized in a method of structuring instruction queues for pre-fetching instructions of different lengths to conduct processing.
2. Description of the Related Art
FIG. 5 is a block diagram showing layout of structure, from an instruction queue to an immediate generator, of a conventional data processing unit.
As illustrated in FIG. 5, the conventional data processing unit includes an instruction queue circuit 400 comprised of three instruction queues and three selectors, a fourth selector 417, an immediate generator 47, an arithmetic unit 48 and a control logic 49.
In the instruction queue circuit 400, a first instruction queue 41 reads an instruction sent from a first selector 411 and sends the instruction to a second selector 412. A second instruction queue 42 reads an instruction sent from the second selector 412 and sends the instruction to a third selector 413 and the control logic 49, as well as sending immediate data to lower-order bits of the immediate generator 47 and the fourth selector 417. A third instruction queue 43 reads an instruction sent from the third selector 413 and sends the instruction to the control logic 49, as well as sending immediate data to higher-order bits of the immediate generator 47 or the fourth selector 417.
The first selector 411 sends an instruction sent from a memory 40 or the first instruction queue 41 to the first instruction queue 41. The second selector 412 sends an instruction come from the memory 40 or the first instruction queue 41 to the second instruction queue 42. The third selector 413 sends an instruction sent from any of the memory 40, the first instruction queue 41 and the second instruction queue 42 to the third instruction queue 43.
The fourth selector 417 sends immediate data sent from the second instruction queue 42 or immediate data sent from the third instruction queue 43 to lower-order bits of the immediate generator 47. After expanding a code of immediate data received from the fourth selector 417 and from the second and the third instruction queues 42 and 43 of the instruction queue circuit 400 to a bit width of an arithmetical unit of the arithmetic unit 48, the immediate generator 47 sends the immediate data to the arithmetic unit 48. The arithmetic unit 48 receives the immediate data from the immediate generator 47 to perform arithmetic and outputs arithmetical results. The control logic 49 receives instructions from the second instruction queue 42 and the third instruction queue 43 to output signals for controlling output of the first selector 411, the second selector 412, the third selector 413 and the fourth selector 417.
Description will be next made of operation of the above-described conventional data processing unit. It is assumed here that the conventional data processing unit shown in FIG. 5 is capable of processing instructions of different lengths, a 16-bit instruction and a 32-bit instruction, and that an arithmetical unit of the arithmetic unit 48 is 32 bits. It is also assumed that the memory 40 stores instructions in a manner as illustrated in a memory map of FIG. 6. A, B, C, D, . . . , X and Y in FIG. 6 are instructions each divided every 16 bits and one address is assumed to be composed of 8 bits. Even address indicates that a leading address of an instruction is a multiple of 4 and an odd address indicates that a leading address of an instruction is not a multiple of 4.
The instruction queue circuit 400 has four states state 0!, state 1!, state 2! and state 3! according to a length of an instruction to be executed. State 0!, state 1!, state 2! and state 3! are illustrated in FIGS. 7, 8, 9 and 10, respectively. Switching to each state is conducted by the control of the first selector 411, the second selector 412 and the third selector 413 of the instruction queue circuit 400 by the control logic 49.
According to the memory map example shown in FIG. 6, since the initial address is even-numbered, the instruction queue enters state 0!. At state 0!, the third instruction queue 43 reads lower-order 16 bits of the memory 40 as illustrated in FIG. 7. The second instruction queue 42 reads higher-order 16 bits of the memory 40. The first instruction queue 41 reads higher-order 16 bits of the memory 40. The control logic 49 reads instructions contained in the second instruction queue 42 and the third instruction queue 43 and identifies the instruction to be executed to determine an address of the memory at which an instruction to be read next resides. Then, if the instruction to be executed is of 16-bit length, transition to state 2! occurs and if the instruction is of 32-bit length, transition to state 0! occurs.
It is assumed here that a 16-bit instruction is executed to cause transition to state 2!. At state 2!, the third instruction queue 43 reads an instruction contained in the first instruction queue 41 as illustrated in FIG. 9. The second instruction queue 42 reads lower-order 16 bits of the memory 40. The first instruction queue 41 reads higher-order 16 bits of the memory 40. The control logic 49 reads instructions contained in the second instruction queue 42 and the third instruction queue 43 and identifies the instruction to be executed to determine an address of the memory at which an instruction to be read next resides. Then, if the instruction to be executed is of 16-bit length, transition to state 3! occurs and if the instruction is of 32-bit length, transition to state 2! occurs.
It is assumed here that a 16-bit instruction is executed to cause transition to state 3!. At state 3!, the third instruction queue 43 reads an instruction contained in the second instruction queue 42 as illustrated in FIG. 10. The second instruction queue 42 reads an instruction contained in the first instruction queue 41. The first instruction queue 41 reads an instruction contained in the first instruction queue 41. The control logic 49 reads instructions contained in the second instruction queue 42 and the third instruction queue 43 and identifies the instruction to be executed to determine an address of the memory at which an instruction to be read next resides. Then, if the instruction to be executed is of 16-bit length, transition to state 2! occurs and if the instruction is of 32-bit length, transition to state 0! occurs.
Thereafter, it is assumed that a jump instruction is issued to branch into an odd address. In this case, the instruction queue enters state 1!, so that the third instruction queue 43 reads higher-order 16 bits of the memory 40 as illustrated in FIG. 8. The second instruction queue 42 reads lower-order bits of the memory 40. The first instruction queue 41 reads higher-order bits of the memory 40. The control logic 49 reads instructions contained in the second instruction queue 42 and the third instruction queue 43 and identifies the instruction to be executed to determine an address of the memory at which an instruction to be read next resides. Then, if the instruction to be executed is of 16-bit length, transition to state 0! occurs and if the instruction is of 32-bit length, transition to state 2! occurs without execution of the instruction because it is not correctly stored in the control logic 49.
Thus using an instruction queue enables efficient execution of instructions of different lengths in each cycle, except when a 32-bit instruction is to be executed after branching into an odd address as illustrated in FIG. 8.
The immediate generator 47 expands a code of an immediate part of an instruction format to 32-bit immediate data. More specifically, the code expanded part of 32-bit immediate data is buried with the same value as that of a code bit of the original immediate. With an instruction of 16-bit length, an immediate part of the instruction contained in the third instruction queue 43 is sent to lower-order bits of the immediate generator 47 through the fourth selector 417 to generate immediate data whose code is expanded. With an instruction of 32-bit length, an immediate part of the instruction contained in the second instruction queue 42 is sent to lower-order bits of the immediate generator 47 and an immediate part of the instruction contained in the third instruction queue 43 to higher-order bits of the immediate generator 47 to generate immediate data whose code is expanded. The arithmetic unit 48 performs arithmetic of the immediate data whose code has been expanded and which is sent from the immediate generator 47 and outputs results.
One of conventional data processing units of this kind is a pre-fetch unit disclosed, for example, in the literature "V Series Microcomputer II" (edited under the supervision of Ryoichi Mori, Microcomputer Series 18, Maruzen Co., Ltd. pp. 33). This literature recites a data processing unit (pre-fetch unit) capable of pre-fetching a 16-bit instruction and comprised of a pre-fetch queue, which is a buffer for storing instructions pre-fetched by a bus control unit, and an aligner which is a data alignment system for aligning, at the time of sending an op code and an addressing mode field to an instruction decode unit through an internal bus, the code and the field to a fixed position on the bus.
Another conventional data processing unit is an instruction pre-fetch buffer disclosed in the literature "Computer Architecture: A Quantitative Approach" (written by Hennessy and Patterson, Nikkei BP, pp. 456-457). This literature recites a data processing unit (instruction pre-fetch buffer) capable of aligning instructions of variable lengths by holding 8-byte continuous instructions and pre-fetching a subsequent instruction every time the instructions are executed at CPU one by one. In the data processing unit, a directed bit is attached to every byte to represent the number of continuous bytes holding an effective instruction. In addition, while the most significant byte can be correlated with an arbitrary byte address, the remaining bytes should be continuous.
Still another conventional data processing unit is an instruction fetching technique disclosed in the literature "Computer Organization & Design: The Hardware/Software Interface (written by Patterson and Hennessy, Nikkei BP, pp. 334-335). This literature describes data processing (instruction fetch) realized by reading an instruction from an instruction memory by using an address in a program counter and writing the instruction to a pipeline register. In this processing, the address in the program counter is carried by four and re-written to the program counter to be ready for next clock cycle. In addition, in order to deal with any instruction to come next, the carried address is also stored in the pipeline register.
The above-described conventional data processing units, however, have a shortcoming that transmission of immediate data costs too much time to enable high-speed processing because a wire length running from an instruction queue to an immediate arithmetic unit can not be shortened. The reason why a wire length can not be shortened is that since position of immediate data varies with a length of an instruction, sending immediate data from an instruction queue to an immediate generator at the time of code expansion requires a wire running from the instruction queue to higher-order bits of the immediate generator and a wire running from the instruction queue to lower-order bits of the immediate generator. Therefore, with any arrangement of instruction queues, it is necessary to provide wires running in parallel to the direction of bit arrangement of the immediate generator as illustrated in FIG. 4, resulting in an increase in a wire length by the distance of the direction.
In addition, at the time of wiring an electronic circuit, it is necessary to ensure, between adjacent wires, a minimum pitch derived from manufacturing constraints or a minimum distance required to prevent adjacent wires from being affected by signal change. When a wire from an instruction queue to an immediate generator includes a wire part parallel to the direction of bit arrangement of the immediate generator as described above, the distance between the instruction queue and the immediate generator is constrained by a bundle of wires provided in that direction, whereby increase in a wire length is inevitable.