1. Field of the Invention
The present invention relates to a microprocessor, and more specifically to a microprocessor which can extend an instruction by using a coprocessor externally coupled with the microprocessor.
2. Description of Related Art
In general, a microprocessor has to be assembled on a semiconductor chip of a limited size. Therefore, although performance of the microprocessor is sacrificed to some extent, an independent function is assembled on another chip or LSI which unit can be independent of and external to the microprocessor. With various functions assigned to different single-chip LSIs, it is possible to prevent the chip size of the microprocessor from being increased to a size which cannot manufactured under current techniques, and also to avoid an increase in cost caused by increase of the chip size.
Particularly, in microprocessors of 16 bits or more which have been required to have a great many function performance and a high degree of, microprocessor is closely coupled to various LSIs having a floating point arithmetic operation, a memory management mechanism, a memory buffer (cache) mechanism, and others. If these functions are assembled within the microprocessor, a system having a great many functions and a small size can be realized. However, these functions have been realized externally of the microprocessor for the reason as mentioned above.
An LSI provided externally to the microprocessor in order to extend an instruction set for the microprocessor is generally known as a coprocessor. Typical functions of this coprocessor include floating point arithmetic operation.
Conventional microprocessors of 4 bits and 8 bits and some types of microprocessors of 16 bits have had only instructions handling integers, because applications had been relatively simple. With expanding application fields, however, it is becoming necessary to execute floating point arithmetic operation at high speed. The conventional microprocessors have executed the floating point arithmetic operation in a software manner by combining instructions for handling integers, since the conventional microprocessors did not involve instructions for floating point arithmetic operations. As a result, the conventional microprocessors can have only a floating point arithmetic operation performance a great deal less than integer arithmetic operation performance.
In general, a coprocessor for floating point arithmetic operation is closely coupled to a microprocessor so as to execute floating point arithmetic operation in place of the microprocessor when the coprocessor itself detects or is notified from the microprocessor that the microprocessor is attempting to execute an instruction for the floating point arithmetic operation (actually, such a the microprocessor cannot execute the floating point arithmetic operation). When the coprocessor for the floating point arithmetic operation is in a condition of executing the floating point arithmetic operation, the microprocessor can execute an instruction other than the instruction for the floating point arithmetic operation.
For example, in a typical conventional system having a microprocessor coupled to a coprocessor for floating point arithmetic operation, the microprocessor of, for example 32 bits is coupled to an external memory and to the coprocessor for the floating point arithmetic operation through a data bus of 32 bits, an address bus of 32 bits and status signal lines. The status signal lines are used to provide notification of the types of bus cycle, timing signals, etc. generated by the microprocessor, to the coprocessor and the memory. On the other hand, the coprocessor outputs to the microprocessor a busy signal notifying it that the coprocessor is in a condition of executing the floating point arithmetic operation.
In brief, the microprocessor includes therein a bus control unit coupled to the data bus, the address bus and the status signal lines, an instruction decoder coupled to the bus control unit for decoding an instruction fetched by the bus control unit, and an instruction execution unit for executing the instruction decoded by the instruction decoder.
When an instruction other than that for the floating point arithmetic operation (called "normal instruction" hereinafter) is executed, the normal instruction is fetched from the external memory to the bus control unit of the microprocessor in an instruction fetch bus cycle, and decoded by the instruction decoder. As a result, the kind of an operation, the kind of operands (for example, the execution unit is notified of register/memory, read/write, etc.) and others.
When there is a memory read operand, the execution unit instructs the bus control unit so as to start an operand read bus cycle. An memory operand stored in the external memory is transferred (or read) to the bus control unit of the microprocessor in the operand read bus cycle. On the other hand, if there is a register operand, a general register ordinarily provided within the execution unit is selected as the operand.
After the memory operand is prepared in the bus control unit of the microprocessor, data of the memory operand is supplied to the execution unit of the microprocessor and an actual execution of the instruction is performed.
Then, if the execution unit of the microprocessor is previously notified by the instruction decoder of the microprocessor that the result of the instruction execution is returned to or written in the external memory, the execution unit operates to instruct the bus control unit so as to start an operand write bus cycle, and at the same time to transfer the result of the instruction execution (i.e., data to be written to the memory) to the bus control unit of the microprocessor. Thus, the result of the instruction execution is transferred (or written) to the external memory in the operand write bus cycle.
In the case of executing the floating point arithmetic operation, the same operation as that for a normal instruction is performed until the kind of operation and the kind of operands are notified by the instruction decoder.
Floating point data stored in the external memory ordinarily consists of 32 bits or more (64 bits or 80 bits), and therefore, cannot be all fetched to the bus control unit of the microprocessor in only one operand read bus cycle. Namely, a plurality of operand read bus cycles are required for completely fetching the floating point data. After the floating point data is completely fetched, the execution unit of the microprocessor instructs the start of a coprocessor operand write bus cycle, so that the floating point data temporarily stored in the bus control unit of the microprocessor is transferred (or written) to the coprocessor. This transfer in most cases also requires a plurality of bus cycles, for the reason as mentioned above.
On the other hand, the kind of operation decoded by the instruction decoder and supplied to the execution unit of the microprocessor is converted (or reconstructed) by the execution unit to a format (command) which can be decoded by the coprocessor. The command generated by the execution unit of the microprocessor is transferred to the bus control unit and, at the same time, the execution unit instructs the bus control unit so as to start the coprocessor write bus cycle.
If the command is transferred (or written) to the coprocessor in the coprocessor write bus cycle, the coprocessor starts an arithmetic operation of the floating point data which had been transferred before the transfer of the command.
When the coprocessor is in a condition of executing the arithmetic operation of the floating point data, the coprocessor outputs a busy signal to the microprocessor for notification that the coprocessor is in a busy condition. On the other hand, if the microprocessor detects the busy signal from the coprocessor, the microprocessor judges that the execution of the floating point operation instruction has not yet been completed, and therefore, the microprocessor cannot start execution of a next instruction. Namely, the microprocessor waits until the busy signal outputted from the coprocessor is brought into a ready condition.
When the coprocessor has completed execution of the floating point operation instruction, the coprocessor brings the busy signal into the ready condition, so that the microprocessor can know of the completion of the execution of the floating point operation instruction. At this time, similarly to the case of the execution of the normal instruction, if the execution unit has been notified by the instruction decoder that it is required to return the result of the operation to the external memory, the execution unit of the microprocessor then starts a coprocessor operand read cycle, so that the result of the operation is transferred (or read out) to the bus control unit once. Thereafter, the execution unit of the microprocessor instructs the bus control unit of the microprocessor so as to cause the bus control unit to transfer the result of the execution to the external memory in the operand write bus cycle.
As seen from the above, an assignment of the execution of the floating point operation from the microprocessor to the coprocessor requires the coprocessor operand write bus cycle, the coprocessor write bus cycle, and the coprocessor operand read bus cycle, which are inherently unnecessary to the microprocessor. As mentioned hereinbefore, each of the coprocessor operand write bus cycle and the coprocessor operand read bus cycle is performed twice or more in most cases, and therefore, there are many bus cycles which are inherently unnecessary to the microprocessor.
In addition, when an arithmetic operation (for example, addition) for two operands is to be executed, if both of the two operands are memory operands, the coprocessor operand write bus cycle is required for each of a first operand and a second operand.
For example, if floating point arithmetic operation is executed for two items of 64-bit length floating point data stored in the external memory, four coprocessor operand write bus cycles, one coprocessor write bus cycle and two coprocessor operand read bus cycles are required because the floating point arithmetic operation is executed by the coprocessor.
Recently, in order to elevate performance, some of microprocessors have adopted a so-called pipelined structure which can simultaneously treat a plurality of instructions, for example, by simultaneously processing a fetching of an instruction, a decoding of another instruction, and an execution of still another instruction, or by further increasing stages for these operations.
The pipelined system is advantageous in that since the processing of instructions can be divided into a number of stages, the apparent execution performance is increased. However, this advantage can be obtained by ensuring that an allocated instruction processing is performed in each stage without delay. For example, an ideal performance of the pipelined system cannot be obtained if there occurs a so-called pipeline hazard in which one stage requires the result of processing performed in a succeeding stage, or a bus neck in which a stage cannot start a necessary bus cycle because an external data bus indispensable to the pipelined system is in use by another stage.
As mentioned above, in a conventional microprocessor associated with a coprocessor, since an address bus are and a data bus commonly used by the microprocessor, an external memory and the coprocessor are occupied for some constant time for a data transfer required in floating point arithmetic operation, the bus neck has occurred.
Furthermore, if the coprocessor has high processing speed, the coprocessor quickly completes execution of its assigned operation. For example, a conventional coprocessor for a floating point arithmetic operation can process at the order of 10 clocks the four fundamental rules of arithmetic operations which are frequently used. On the other hand, the number of clocks for using the data bus has often exceeds the clock number for the arithmetic operation. This has been a serious problem since the microprocessor begins to have a large amount of overhead.