1. Technical Field
The present invention relates to a data processing system that is equipped with dedicated circuit.
2. Description of the Related Art
There have been increasing demands for processors that are dedicated to particular applications. In the fields of image processing and network processing, for example, a processor equipping with dedicated circuit that is dedicated to certain processes and special-purpose or dedicated instructions for activating such dedicated circuit flexibly handles the specifications of different applications and is produced with superior cost-performance. The applicant of the present application discloses of such processor in U.S. Pat. No. 6,301,650.
One difficulty when producing a processor that can flexibly handle the specifications of applications according to the user's desired specification is that there is a trade-off between (i) the freedom with which special-purpose instructions (user specified instructions) can be implemented in accordance with user demands and (ii) the ability to execute such special-purpose instructions with low overheads.
The processor disclosed in U.S. Pat. No. 6,301,650 is equipped with one or more special-purpose unit (a special-purpose data processing unit, hereafter referred to as the “VU”) and a general-purpose unit (a basic execution unit or processor unit, hereafter referred to as the “PU”) that can perform general-purpose processing or basic processing. The processor has, in addition to the general-purpose processing ability supplied by the general-purpose processing unit PU, special-purpose processing ability supplied by dedicated circuit, which is dedicated to processing for performing the user's desired specification and such dedicated circuit can be implemented with an extremely high degree of freedom. Therefore, special-purpose instructions defined by the user can be implemented with an extremely high degree of freedom. In the processor, equipping with registers that are commonly accessed by both the PU and VUs, data transfers between the PU and VUs can be performed by merely executing a register transfer instruction such as a “MOVE” instruction. In this way, the processor has an architecture in which special-purpose instructions, including instructions that exchange data with the PU, can be implemented as VUs with great freedom.
In the fields of image processing and network processing where real-time processing is required, there have been increasing demands in recent years for high-speed processing and real-time processing at a higher processing level. For example, in the above processor that transfers data via registers, when a VU performs data processing on PU data according to a user special-purpose instruction, at least two cycles are required by processing that first transfers the data from the PU and transfers the computation result back from the VU. If the processing performed by the VU consumes a large number of clocks, such as several dozen clocks, the number of clocks consumed by the data transfers between the VU and the PU is relatively low compared to the number of cycles consumed by the processing by the VU, and so is not particularly significant. However, if processing performed by the VU is based on a product-sum operation and is completed in a few clocks, the number of clocks consumed by the data transfers appears as an extremely large overhead.
In particular, when the range of processing that can be executed by special-purpose instructions that are implemented using dedicated circuitry of VU is increased in order to raise the processing speed of the processor, the number of clocks consumed by the processing of each dedicated circuit tends to fall, resulting in a relative increase in the overheads of data transfers.
A method where a common register is equipped with for commonly accessed by a PU and a VU has a wide applicability. However, at least one cycle is consumed when transferring data from an internal register of the PU or VU to the common register used for data transfer, so that a total of four cycles are consumed when data is transferred between the VU and PU and is sent back thereafter. As explained, large improvements in processing speed are expected by reducing the number of clocks consumed by data transfers. However, modifying the configuration of the PU to suit the configuration of the VU sacrifices the general-purpose nature of the PU, thereby reducing the value of the PU as a platform on which a VU of a desired configuration can be implemented in accordance with a user specification. If it becomes necessary to redesign the PU as well, the development period of the processor becomes longer and the cost of the processor increases, so that this is not an economical solution.
The present invention has a first object of providing a data processing apparatus or system and a control method thereof that can reduce the overheads of data transfers between PU and VU without sacrificing the general-purpose nature of the PU. A second object of the present invention is to provide a data processing system and a control method in which processing can be executed by VU without no or little apparent consumption of clock cycles due to data transfers between VU and PU.