Processors (data processing systems or LSIs) incorporating an operation function such as microprocessor (MPU) and digital signal processor (DSP) are known as apparatuses for conducting general-purpose processing and special digital data processing. Architectural factors that have significantly contributed to improved performance of these processors include pipelining technology, super-pipelining technology, super-scalar technology, VLIW technology, and addition of specialized data paths (special purpose instructions). The architectural elements further include branch prediction, register bank, cache technology, and the like.
There is a clear difference in performance between non-pipeline and pipeline. Basically, with the same instruction, the number of pipeline stages reliably improves throughput. For example, the four-stage pipeline can be expected to achieve at least fourfold increase in throughput, and the eight-stage pipeline will achieve eightfold increase in throughput, which means that the super-pipeline technology additionally improves the performance twice or more. Since the progress in process enables segmentation of the critical paths, an upper limit of an operating frequency will be significantly improved and the contribution of the pipeline technology will be further increased. However, the delay or penalty of a branch instruction has not been eliminated, and whether a super-pipeline machine will succeed or not depends on how much a multi-stage delay corresponding to the memory accesses and branches can be handled with instruction scheduling by a compiler.
The super-scalar technology is the technology of simultaneously executing instructions near a program counter with sophisticated internal data paths. Also supported by the progress in compiler optimization technology, this technology has become capable of executing about four to eight instructions simultaneously. In many cases, however, the instruction itself frequently uses the most recent operation result and/or result in a register. Aside from the peak performance, this necessarily reduces the average number of instructions that can be executed simultaneously to a value much smaller than that described above even by making full use of various techniques such as forwarding, instruction relocation, out-of-order and register renaming. In particular, since it is impossible to execute a plurality of conditional branch instructions or the like, the effects of the super-scalar technology are further reduced. Accordingly, the degree of contribution to improved performance of the processor would be on the order of about 2.0 to 2.5 times on the average. Should an extremely well compatible application exist, a practical degree of contribution would be on the order of four times or less.
The VLIW technology comes up as the next technology. According to this technology, the data paths are configured in advance so as to allow for parallel execution, optimization is conducted so that a compiler improves the parallel execution and generate a proper VLIW instruction code. This technology adopts an extremely rational idea, eliminating the need for the circuitry for checking the likelihood of parallel execution of individual instructions as in the super-scalar. Therefore, this technology is considered to be extremely promising as means for realizing the hardware for parallel execution. However, this technology is also incapable of executing a plurality of conditional branch instructions. Therefore, a practical degree of contribution to performance would be on the order of about 3.5 to 5 times. In addition, given a processor for use in processing of an application that requires image processing or special data processing, the VLIW is not an optimal solution either. This is because, particularly in applications requiring continuous or sequential processing using the operation results, there is a limit in executing operations or data processing while holding the data in a general-purpose register as in VLIW. This problem is the same in the conventional pipeline technology.
On the other hand, it is well known from the past experiences that various matrix calculations, vector calculations and the like are conducted with higher performance when implemented in dedicated circuitry. Therefore, in the most advanced technology for achieving the highest performance, the idea based on the VLIW becomes major with the various dedicated arithmetic circuits mounted according to the purpose of applications.
However, the VLIW is the technology of improving the parallel-processing execution efficiency near a program counter. Therefore, the VLIW is not so effective in, e.g., executing two or more objects simultaneously or executing two or more functions. Moreover, mounting various dedicated arithmetic circuits increases the hardware, also reduces software flexibility. Furthermore, it is essentially difficult to solve the penalty occurs in executing conditional branching.
It is therefore an object of the present invention to study the problems from a different standpoint of these conventional technologies for increasing the processor speed, and to provide a new solution. More specifically, it is an object of the present invention to provide a system, i.e., a control program product, capable of improving the throughput like pipeline while solving the penalty in executing the conditional branching, a data processing system capable of executing the control program, and its control method. It is another object of the present invention to provide a control program product capable of flexibly executing individual data processing, even if they are complicated data processing, at a high speed without having to use a wide variety of dedicated circuits specific to the respective data processing. Also, providing a data processing system capable of executing the program, and its a control method are one of the object of this invention.