1. Field of the Invention
The present invention relates to a data processing apparatus including a memory access pipeline and an arithmetic operation pipeline which operate independently of each other, and particularly to a CISC type superscalar processor having an arithmetic operation which includes a memory access.
2. Description of the Related Art
FIG. 9 shows a structural example of an existing CISC type superscalar processor. An instruction is transmitted to an instruction decoding unit 103 from an instruction buffer 101 for execution of a meaning analysis process (decoding), checking and acquiring process and an instruction developing process.
The checking and acquiring process analyzes which buffer the current operation needs, if there is free entry at or sufficient free space in each specific buffer, and keeps the entry space of all specific buffers for a current operation. It is necessary that there is a free entry at each specific buffer unit to decode an operation. Using an add operation by way of example, an add operation needs entry at 104, 106 and 105 in FIG. 9. When all buffers have free entry in the cache, the add operation can be decoded. For example, if 104 and 105 is OK but 106 has no free entry, the add operation cannot be decoded. An add operation will stay in the decoder and check entry again in cache cycle until entry of 106 becomes free.
The instruction development involves dividing an operation, for example, an add operation, into two processes. The first process is a load operation where the operands are obtained and the second process is the add operation itself where the operands are added.
With respect to complicated instructions (for example, a memory access operation of reading a long length of data from memory or writing it to memory that cannot be processed by a single memory access), if an instruction requires several cycles to develop, the instruction stays in the instruction decoding unit 103 for several cycles. The instruction buffer 102 is the buffer for complicated instructions.
With respect to such complicated instructions, the instruction in the instruction buffer 101 is sent to the instruction decoding unit 103 and thereafter the instruction execution process is conducted for a plurality of cycles, for example, instruction decoding unit 103, to instruction buffer 102, to instruction decoding unit 103, to instruction buffer 102, to instruction decoding unit 103, and so on.
If an instruction requires a read or write operation for a long length of memory data, which cannot be processed by a single memory access, the instruction stays in instruction buffer 102 and instruction decoding unit 103 until the reading or writing operation is complete.
With respect to the memory access arithmetic operation, after the instruction decoding unit 103 decodes the instruction, it is transmitted to a reservation station (RS) 104 on the memory access pipeline side and to RS 105 on the arithmetic operation pipeline side. In the memory access pipeline side, the address calculation for memory access is conducted in address calculation unit 106 and memory access is conducted in memory access unit 107. The data obtained in memory access unit 107 is written into register file 108.
Meanwhile, in the arithmetic operation pipeline side, the instruction which is decoded is transmitted to RS 105 from instruction decoding unit 103 and waits for the data to be written to register file 108. After the data is written to register file 108, an instruction is transmitted to arithmetic unit 109 from RS 105. Thereafter, arithmetic operation unit 109 executes the arithmetic operation for the data in register file 108.
In FIG. 9, RS 104, address calculation unit 106 and memory access unit 107 provide a memory access pipeline, while RS 105 and arithmetic operation unit 109 provide an arithmetic operation pipeline.
The existing CISC type superscalar processor develops, before the instruction is issued, namely, before the instruction is written to RS 104, RS 105, the processes of analyzing the arithmetic operation instruction including memory access and decomposing the instruction into an arithmetic operation instruction to be executed using the data obtained by the memory access.
Decomposing an instruction will be explained using a storage-to-storage (SS-type) operation. An NC (and Character) operation, as an example, will be discussed below. The instruction NC 2(64,5),0(5) is a 64 byte length instruction where operand 1 has a displacement of 2 added to base register 5 and operand 2 has a displacement of 0 added to base register 5 such that the operand 1 address =value of register 5+2 and the operand 2 address=value of register 5+0.
Generally, an NC operation proceeds as a bit level xe2x80x9candxe2x80x9d operation in which a first bit of operand 1 is anded with a first bit of operand 2 and the result of the xe2x80x9cfirst bit anded with first bitxe2x80x9d is put or stored in memory in place of operand 1""s first bit. This bit level operation is repeated for the second bit and so on until the end the of length of the operand is reached. (An attribute of length is the number of xe2x80x9cBYTExe2x80x9d. If length is 4, then 32(=4xc3x978) bits would be anded.) If there is an updating space between operand 2 address and operand 2 address plus length of the result data, the memory data loading is done one byte at a time.
When the first bit of operand 1 is corresponds to the 17th bit of operand 2, the result of anding the first bits of operand 1 and operand 2 is stored in the first bit of operand 1. When the 17th bit of operands 1 and 2 are anded, the result is stored in the 17th bit of operand 1 which is the 1st bit of operand 2. This means that the and of the 17th bit of the operands should be performed properly.
Thus, it might be necessary that this NC operation process data one byte at time, 64 cycles to load for operand 1 and 64 cycles to load for operand 2, 64 cycles to xe2x80x9candxe2x80x9d operand 1 data and operand 2 data, 64 cycles to store the result.
The existing CISC type super scalar processor executes the following 4 steps or items before the issue of the instruction; (1) determine if the NC operation""s operand 2 reference is from an updated memory area, (2) set the NC operation to process one byte data at a time, (3) determine if the length of NC operation is 64 bytes, (4) decompose the NC into 64 groups of (operand 1 load, operand 2 load, operand 1 data xe2x80x98andxe2x80x99 operand 2 data, sotre the result of operand 1 data area). In this example, decomposing or decomposition involves dividing the operation into 64 parts.
The existing CISC system has following problems because complicated instructions are developed before they issue.
The instruction is developed at instruction decoding unit 103 where all instructions are transmitted. If a plurality of cycles are required to develop an instruction, the execution timing of the other instructions, which may be executed more quickly, might be delayed. Since the instructions are developed before they issue, the memory access arithmetic operation instruction is transmitted to both the memory access pipeline and arithmetic operation pipeline. Since the instruction that includes memory access is stored in RS 104 in the arithmetic operation pipeline, entry of RS 104 is used for these instructions. Therefore, in some cases, RS 104 becomes full and transmission of the subsequent register arithmetic operation instructions stops.
Since a plurality of cycles are required for memory access, the memory access arithmetic operation instruction in RS 105 of the arithmetic operation pipeline might remain in RS 105 for a long time. Thereby, execution of a subsequent register arithmetic operation instruction is possibly prevented.
It is an object of the present invention to solve the above problems and effectively use the arithmetic operation pipeline even during execution of a memory access arithmetic operation instruction.
A first embodiment of the present invention includes a plurality of pipeline processing mechanisms, comprising a memory access pipeline for accessing memory and a calculation pipeline for executing arithmetic operations. A device is provided in the memory access pipeline to decode and develop instructions for arithmetic operations including memory access whereby decoding and development of arithmetic operation instructions including memory access do not take place in the stage before inputting the instruction to the memory access pipeline and arithmetic operation pipeline, but rather decoding and development of arithmetic operation instructions including memory access are conducted in the memory access pipeline after the arithmetic operation instruction including the memory access is transmitted to the memory access pipeline.
In a second embodiment of the present invention, transmission of subsequent instructions to the memory access pipeline are prevented when the arithmetic operation instruction including memory access is developed over a plurality of cycles in the memory access pipeline.
In a third embodiment of the present invention, a device acquiring resources to hold the data read from memory in execution of the arithmetic operation instruction including memory access and to transmit the data stored in the resource to the arithmetic operation pipeline is provided.
In a fourth embodiment of the present invention, in the data processing apparatus of the third embodiment, when the resource cannot be acquired, a waiting condition occurs while the current condition is maintained until a vacant area is generated in the resource.
In a fifth embodiment of the present invention, in a data processing apparatus of any of the first through fourth embodiments, a device receiving the data read from memory is provided to transmit the memory data reference arithmetic operation instruction to the arithmetic operation unit in the arithmetic operation pipeline from the memory access pipeline.
In a sixth embodiment of the present invention, in the data processing apparatus of the fifth embodiment, a device judging a priority sequence between the memory data reference arithmetic operation instruction transmitted from the memory access pipeline and the instruction input from the heading portion of the arithmetic operation pipeline to interpret the instruction as the arithmetic operation object is provided.