1. Field of the Invention
The present invention relates to a program controller for use in a processor which is capable of pipe-line operation.
2. Description of the Related Art
With the recent expansion in the applicable areas of digital signal processing, DSPs (digital signal processors) having high processing ability have been increasingly demanded. In general, a DSP executes instructions through pipe-line control. In answer to the demand for high-speed DSPs, method have been proposed for using an increased number of pipe-line stages while allocating a shorter process time for each pipe-line stage.
In a pipe-line control method, a number of instructions are executed in parallel with respect to the time axis, the number being equal to the number of pipe-line stages. However, when executing a branch instruction that will branch out to a post-branching address, it would be impossible to previously fetch an instruction which is stored in any post-branching address before the post-branching address is calculated. Therefore, a plurality of instructions cannot be simply executed in parallel where branch instructions are involved. Accordingly, it is necessary to employ some measures for ensuring that a fetch for a post-branching instruction can occur only after a corresponding post-branching address has been calculated and set in a program counter. One proposed method (e.g., Japanese Laid-open Publication No. 62-54342) is a so-called delayed branch technique where post-branching instructions are always executed after the execution of one or more instructions that are stored following a branch instruction. According to the delayed branch technique, for each branch instruction the programmer or compiler must state a predetermined number of instructions (i.e., delay slots) that will always be executed following the branch instruction.
FIG. 10 illustrates a processing timing scheme for delayed branching by a pipe-line processor having seven pipe-line stages. In FIG. 10, IF1 and IF2 represent the timing for instruction fetching; D1 and D2 represent the timing for instruction decoding; MA represents the timing for data memory access; OF represents the timing for operand fetching: and EX represents the timing for execution of calculation. In the timing scheme shown in FIG. 10, a processor decodes a conditional branch instruction of address N to know that it is a branch instruction (time 1000), and fetches an operand storing a post-branching address N (time 1010). However, it is impossible to begin fetching the instruction of post-branching address N at either time 1000 or 1010 because whether or not the branching will actually take place is governed by the result of an immediately preceding instruction, i.e., a comparison calculation instruction of address N-1.
The processor completes at time 1010 a comparison calculation between the values held in respective registers in accordance with the comparison calculation instruction of address N-1, and stores the result of the comparison calculation in a flag register. Based on this result, the processor completes at time 1020 the execution of the branch instruction of address N so that a fetch for the instruction of post-branching address M is begun if the condition is met, for example.
Thus, the processor cannot begin fetching the post-branching instruction of address N until the branch instruction of address N is completed. In this exemplary processor which has seven pipe-line stages, six instructions (addresses N+1 to N+6) following the branch instruction are the delay slots that are executed in parallel with the branch instruction of address N.
Clearly, the number of delay slots increases as the number of pipe-line stages of a processor, to which the delayed branch technique is applied, increases. However, given the large number of delay slots, there may not be enough instructions that conveniently need to be executed. In the case where no actual instruction can be conveniently stated as a given delay slot, an "NOP instructions" (i.e., a no-operation instruction or an instruction that does not result in the execution of any calculation) is described as a delay slot. However, this presents a problem in that a number of clock cycles are run in vain every time the branch instruction is executed.
On the other hand, a DSP may be required to be capable of performing a plurality of processes, e.g., voice compression and construction of data sequences for communications, rather than only performing a voice compression process.