The present invention relates to a data processing system with a host processor and a coprocessor, and more particularly to a pipelined data processing system having a host processor and a coprocessor which are implemented on a single chip and method of interfacing between the processors for performing enhanced pipelined operations.
With an explosive growth in the market of portable electronic products, technological emphasis in VLSI (Very Large Scale Integration) circuit design is shifting away from high speed to low power. However, high-speed functions are still indispensable for a microprocessor (or microcontroller) performing complex mathematical computation operations, for example, multiplications. Such need for speed becomes more pronounced in RISC (Reduced Instruction Set Computer) type processors, DSP (Digital Signal Processing) units, and graphic accelerators because these devices have increased demand for multimedia applications.
As demand grows for enhanced performance of microprocessor-based data processing systems, more complex techniques have been developed and used in microprocessor designs. For example, pipelined data processing techniques such as division of processor operations into multiplicity of elementary suboperations are employed.
With reference to FIGS. 1A and 1B, an execution of an instruction, requiring time T for the execution in a non-pipelined mode of operation, is divided into a plurality of suboperation stages in a pipelined mode of operation. For example, three-stage pipelined mode of operation typically has three suboperation stages, such as ST1, ST2, and ST3. A processor in a pipelined mode is partitioned in such a manner that each suboperation of a pipelined instruction is completed in a predefined stage time period Ts. As a result of such partitions on an instruction, an execution of such pipelined instruction requires three stage time periods 3Ts, which is longer than time T required for an execution of an instruction in a non-pipelined mode of operation.
In a pipelined mode of operation in which a processor separately executes each suboperation by partitioning a pipelined instruction, however, a processing of a pipelined instruction can be initiated after a stage time period Ts rather than after a time period T as in the non-pipelined mode of operation. Since a stage time period Ts for an execution of each suboperation of a pipelined instruction is shorter than time T for an execution of an non-pipelined instruction, the execution of an instruction in a pipelined mode of operation can be expedited. A stage time period Ts can be chosen as small as possible consistent with the number of suboperation stages in a pipelined mode of operation unit.
Recent advancements in VLSI technology have made DSP technology readily available, so that it is not difficult to find electronic products equipped with some form of multimedia DSP capability. Many consumer electronic products with multimedia DSP capability have a microprocessor chip for the control and I/O operations and a separate DSP chip for signal processing.
A SOC (System-On-a-Chip) approach is attracting attention of chip designers (particularly, ASIC designers) because such design represents savings in cost, power consumption, system design complexity, and system reliability as compared to designs having two or more separate processors.
A simple solution to an integration of, for example, a microprocessor core and a DSP core is to put the two independent cores on a single die (i.e., a microprocessor core for control tasks and a DSP core for signal processing algorithms). This simple two-core approach can provide chip designers with a flexibility in choosing any application specific pair of microprocessor cores and DSP cores to fit a target application optimally. This approach, however, suffers from several drawbacks: (1) III Programmability, because the cores should have their own programs and data; (2) Communication Overhead, because resource conflicts, false data dependencies and deadlocks need to be prevented through a complex scheme; and (3) Hardware Overhead due to the duplicated part of the two cores, which results in increased hardware cost and power inefficiency.
Another way to support microprocessor and DSP capabilities on a chip is to use a single processor having both the capabilities, for example, a microprocessor with DSP capabilities or a DSP unit with powerful bit manipulation and branch capabilities.
In general, a microprocessor is a necessary element in electronic products; therefore, there are motivations on the part of designers to integrate a SOC design around a microprocessor. Compared with the two-core approach, the SOC approach can achieve efficient communications between a microprocessor (or a host processor) and its interfaces, for example, coprocessors. By equipping the microprocessor with DSP coprocessor instructions and interface scheme, control functions and DSP functions can be implemented on a single processor chip which also provides a single development environment. The SOC approach also has other advantages over the two-core approach. For example, DSP programs can be written easily by using coprocessor instructions of a host processor, and hardware cost can be reduced because there is no hardware duplication.
The overall processing efficiency in such a host-coprocessor SOC architecture is a function of a number of factors: for example, the computing capability of a coprocessor and the information exchange capability between a host processor and a coprocessor. The computing capability of a coprocessor depends upon how many instructions the coprocessor has and how fast the coprocessor executes each instruction. Such features of a coprocessor are knowable by its specification. Thus, an improvement of a coprocessor performance can be achieved by using, within cost limits, a coprocessor with specification of desired features. On the other hand, the information exchange capability between a host processor and a coprocessor is affected by coprocessor interface protocols of a host processor, rather than a coprocessor performance.
In such conventional host-coprocessor SOC techniques, however, in order to improve the coprocessor capabilities., more powerful coprocessor instructions with appropriate data paths needed to be added to a host processor. Such design is tantamount to a new processor chip. If there is a bottleneck in the information exchange between the host and coprocessor, the system performance will not be improved. Hereinafter, an example of such a bottleneck in the information exchange will be explained.
FIG. 2 is a timing diagram showing pipelined executions of three subsequent instructions I1, I2, and I3 in a typical RISC-based host-coprocessor system. Each instruction I1, I2 or I3 of a RISC instruction pipeline, so-call three-stage pipeline, has three stages: Instruction Fetch (IF), Instruction Decode (ID), and Execution (EX) stages. Each of the three stages IF, ID and EX for an instruction is intended to be completed in a single cycle of a clock signal CLK.
For the purpose of explanation, in FIG. 2, a first instruction I1 is assumed to be a host processor instruction for an execution of a host processor operation, and second and third instructions I2 and I3 are coprocessor instructions for execution of coprocessor operations. The first instruction II is ready to be executed by the host processor alone without coprocessor interfacing, and the second and third instructions I2 and I3 are intended to be executed by the coprocessor responsive to coprocessor commands I2xe2x80x2 and I3xe2x80x2 (corresponding to instructions I2 and I3, respectively) and coprocessor interface signals INF which are issued by the host processor depending on results of decoding the coprocessor instructions I2 and I3.
Referring to FIG. 2, first, the host processor instruction I1 is fetched during cycle T0. That is, the instruction I1 is loaded from a program memory into the host processor. In the next cycle T1, the instruction I1 is decoded therein and at the same time the coprocessor instruction I2 is fetched. The host processor instruction I1 is executed by host processor during cycle T2, in which the coprocessor instructions I2 and I3 are simultaneously decoded and fetched, respectively. During cycle T3, the host processor issues the coprocessor command I2xe2x80x2 corresponding the instruction I2 and also produces coprocessor interface signals INF for the instruction I2. Thus, the coprocessor is interfaced with the host processor under the control of the interface signals INF and then completes decoding of the command I2xe2x80x2 from the host processor. During cycle T4, the coprocessor executes the command I2xe2x80x2.
Due to the execution of the command I2xe2x80x2 associated with the instruction I2 in cycle T4, the instruction pipeline has to be stalled for one clock cycle. Hence, the execution stage of the instruction I3 should be suspended for one cycle and then executed in cycle T5. The coprocessor decodes the command I3xe2x80x2 corresponding to the instruction I3 during cycle T5, and in the next cycle T6 the command I3xe2x80x2 is executed by the coprocessor.
Thus, the pipeline stalling results when the respective coprocessor commands I2xe2x80x2 and I3xe2x80x2 are decoded in the same clock cycles as the corresponding coprocessor instructions I2 and I3 are executed. Such pipeline stalling behaves like a bottleneck in information exchanges between a host processor and a coprocessor, causing degradations in computing speed and system performance.
It is an object of the present invention to provide a low-power, low-cost, high-performance data processing system suitable for multimedia applications, specifically an improved host-coprocessor system-on-a-chip (SOC) performing pipelined operations.
It is another object of the present invention to provide a host-coprocessor SOC with an efficient and powerful coprocessor interface scheme.
It is still another object of the present invention to provide a method for accomplishing effective interfaces between a host processor performing pipelined operations and at least one coprocessor on a single chip.
These and other objects, features and advantages of the present invention are provided by a pipelined microprocessor which fetches an instruction, predecodes the fetched instruction when the fetched instruction is identified as a coprocessor instruction during an instruction fetch (IF) cycle of the instruction, and at least one coprocessor for performing additional specific functions. The microprocessor (host processor) issues to the coprocessor a coprocessor command corresponding to the fetched instruction. The coprocessor decodes the coprocessor command during an instruction decode/memory access (ID/MEM) cycle of the instruction and executes the decoded coprocessor command during an instruction execution (EX) cycle of the fetched instruction.
According to a preferred embodiment of the present invention, the host processor generates a plurality of coprocessor interface signals (e.g., A, B, C and D) when the fetched instruction is identified as a coprocessor instruction. Through the coprocessor interface signals the host processor issues the coprocessor command corresponding to the coprocessor instruction. The coprocessor provides its status data to the host processor after executing the coprocessor command in the EX cycle of the instruction.
A data memory is commonly connected to both the host processor and the coprocessor. The coprocessor accesses the data memory only at time designated by the host processor, and during which the host processor is guaranteed not to access the data memory. An internal clock generation circuit is provided for the host processor and the coprocessor. The internal clock generation circuit generates internal clock signals synchronized with an external clock signal. The host processor generates the coprocessor interface signals, synchronizing with one of the internal clock signals.
According to another aspect of the present invention, in order to execute the coprocessor instruction for a specific function, while performing operations for normal control functions, the host processor checks in an IF stage if the fetched instruction is a coprocessor instruction. If so, the host processor predecodes the fetched instruction during the IF stage. Then, the host processor issues a coprocessor command corresponding to the fetched instruction in the ID/MEM stage of the instruction. The coprocessor then decodes the coprocessor command in the ID/MEM stage, and executes a coprocessor operation designated by the coprocessor command in the EX stage of the instruction. The coprocessor provides the host processor with coprocessor status data after the execution of the coprocessor operation in the EX stage. Then, the host processor evaluates the coprocessor status data to provide for a next conditional branch instruction.