1. Field of the Invention
The present invention relates to a data processor for executing a program that consists of plural instructions and contains a repeat block which is repeatedly processed.
2. Description of the Prior Art
In general, digital signal processing involves frequent repeat processing. Digital signal processors (DSPs) are processors designed specifically for high-speed digital signal processing, and many of these processors have a single instruction or repeat instruction for efficient processing of a repeat block that contains plural instructions which are repeatedly processed.
On the other hand, there have been developed multimedia-oriented data processors for efficient digital signal processing through utilization of VLIW (Very Long Instruction Word) techniques. FIG. 36 is a flowchart showing repeat processing implemented by software of such a conventional data processor disclosed in Japanese Patent Laid-Open Gazette 9-212361 (U.S. Pat. No. 5,901,301). Unlike a DSP of the type that implements the flow of signal processing by hardware, this data processor requires, for speeding up repeat processing, software pipelining of load latency, lifetime of register values or the like, and calls for optimization by expanding the repeat processing to some extent by software. Further, even a simple multiply-add operation necessitates readout of data from two areas on a memory. Hence, to realize high-speed processing, it is customary to use processing of plural pieces of data as a loop unit.
A brief description will be given below of the operation of the prior art example.
The FIG. 36 example shows the case where the basic number of repetitions of the multiply-add operation to be performed is set to 4 for loop processing; plural programs are independently provided which perform the multiply-add operation processing 1, 2, . . . , 7, 4n, 4n+1, 4n+2 and 4n+3 (where n is an integer equal to or greater than 2) times, respectively. The data processor decides, in step ST1, whether the number of times the repeat block is repeatedly processed, that is, the repeat count is 8 or more. When the repeat count is 8 or more, the data processor goes to step ST2 to further decide whether the repeat count is equal to 4n, 4n+1, 4n+2 or 4n+3, and based on the result of decision, causes a branch to the corresponding program for the multiply-add operation processing, thereafter executing the program (steps ST3a to ST3d). On the other hand, when the repeat count is smaller than 8, the data processor goes to step ST4, in which it further decides to which of 1 to 7 the repeat count is equal, and based on the result of decision, causes a branch to the corresponding program for the multiply-add operation processing, thereafter executing the program (steps ST5a to ST5g).
As described above, when the repeat count for repeat bock processing changes dynamically, the data processor decides the repeat count by software and causes a branch to the program corresponding to the repeat count.
With the conventional data processor of the above construction, when the repeat count for processing the repeat block undergoes a dynamic change, or when the same subroutine is called with a given repeat count, the overhead for deciding the repeat count becomes too large to achieve a high level of performance. Further, since codes are needed for decision of the repeat count, branching based on the result of decision and repeat processing according to the repeat count, the program size for repeat processing becomes inevitably large. In particular, ROMed software raises the hardware cost because of the code-size-dependence of the actual ROM size; furthermore, even simple repeat processing for speedup requires quite a complicated program, placing a high load on program development and increasing the possibility of bugs mixing into the program.