The invention relates to a data processor having a plurality of registers and, more particularly, to a data processor which can execute a program at a high speed by enabling a number of registers to be used.
In association with the advancement of the recent LSI technique, an arithmetic operation processing ability of a data processor is increasing more and more. Such a data processor can be used as, for example, a microprocessor in an engineering work station. Also, a multiprocessor can be also constructed by using a number of such data processors.
The arithmetic operation processing ability of the data processor can be relatively easily improved by improving the operating frequency of the data processor or by parallel processings in the data processor. On the other hand, an accessing speed of a memory, namely, a data transfer ability between the memory and an arithmetic operating unit in the data processor cannot be improved significantly because of a delay or a limitation of the number of LSI pins which are used for a data transfer path. Therefore, since the data transfer processing ability is relatively deteriorated as compared with the arithmetic operation processing ability, the data transfer processing ability becomes a bottle neck and it is impossible to take maximum advantage of the arithmetic operation processing ability of the data processor.
A proposed method of solving the above problem is to make use of a cache memory. However, in an application field such as a large scale numerical application which handles a very large data area, the cache memory is hardly useful. A method of increasing the number of registers in the data processor has also been considered. By this method, for example, the number of times of the operations to save and restore the intermediate results of arithmetic operations occurring due to a lack of registers into the memory is reduced and a deterioration of the performance in association with such saving and restoring operations can be prevented. However, since the register is designated by a register specifier field in an instruction word of the data processor, the number of registers which can be designated by the register specifier field is set to the upper limit of the number of registers on an architecture. In order to prepare and use the number of registers exceeding such an upper limit, some expansion on the architecture is needed. The following three kinds of techniques are known as such a kind of technique.
The first kind of technique relates to a vector register which is used in a processing system in which a vector processing unit is added to a processor according to the conventional technique. The vector register can store hundreds of data in a lump. According to such a processing system, hundreds of data of the vector register can be processed by one instruction called a vector instruction only for use in the vector processing. Only a vector instruction can access the vector register. The vector register cannot be accessed by a conventional instruction. This kind of technique is used in, for instance, the supercomputer S-820 made by Hitachi Ltd. An improved technique of the first kind of technique has also been proposed in Hironaka et al. of Kyushu University, "Benchmarking a Vector-Processor Prototype Based on Multithreaded Streaming/FIFO Vector (MSFV) Architecture", International Conference on SUPERCOMPUTING, 1992. According to the processor of the MSFV system as mentioned above, efficiency is improved by making a length of the vector which is stored into the vector register variable. Further, assuming that the vector length is set to 1, the vector instruction substantially becomes a scalar instruction (conventional instruction mentioned above).
The second kind of technique is a technique called a register window wherein registers of a number that is larger than the number of registers which can be designated by a register specifier field are prepared in the data processor and those registers are combined into groups each comprising the number of registers which can be designated by the register specifier field and those groups are switched by an instruction and used. This kind of technique has been disclosed in, for example, J.L. Hennessy and D.A. Patterson, "Computer Architecture: A Quantitative Approach", Morgan Kaufmann Publishers, Inc., 1990, pages 450-454.
The third kind of technique is a technique such that the register specifier field is enlarged and the number of registers which can be designated by an instruction is increased.
According to the processing system based on the above first kind of technique, the vector processing unit is added to the processor of the conventional type and an instruction for the vector processing unit is added, so that the program formed for the processor of the conventional type can be also executed. Therefore, there is no problem in terms of the compatibility of the program. To realize such a processing system, however, a very large scale of hardware is needed. Further, with regard to the portion in which the vector processing of the program cannot be realized, since the processing is performed in the processor by the conventional technique, the vector register cannot contribute to solve the lack of registers in such a portion.
On the other hand, in the case where one data (scalar data) which is obtained as a result of arithmetic operation executed by the processor according to the conventional technique is used in a vector processing unit, it is necessary to transfer data from the register in the processor according to the conventional technique to the scalar register in the vector processing unit. Such a processing becomes an overhead. It is, however, considered that such a problem can be solved to a certain extent in a processor based on the MSFV system as an improved technique of the first kind of technique.
According to the second kind of technique, it is possible to reduce the overhead in association with the memory access that is necessary to save the register when a subroutine is called or to restore the register at the time of returning from the subroutine. However, while one subroutine is being executed in spite of the fact that the number of registers in the processor was increased, since the number of registers which can be used is unchanged, the memory access to temporarily write or read out the intermediate result of the calculation into/from the memory cannot be reduced. According to the above technique, therefore, although the performance of the program in which the number of calling times of the subroutine can be improved, it is impossible to improve the performance in a program such that a frequency of the subroutine calling times is small as in a large scale numerical application and the execution of a loop such as to repeat the same calculation with respect to each element of a large scale array occupies most of the executing time.
Particularly, in a data processor such that it has a calculation pipeline in order to make an arithmetic operation processing speed high, it is desirable to realize a loop unrolling (unfolding of the loop iteration in a source code) of the number as many as the number (n) of stages of the calculation pipeline in order to improve a use efficiency of the arithmetic operation unit in the processing of the loop which repeats the array calculation. In this case, however, as the number of registers to store the array elements, it is necessary to use the registers of the number that is n times as large as the number of registers in the case where the unrolling is not realized. The number of registers which can be used per execution of the loop is substantially 1/n of the number of registers. According to the second kind of technique, as mentioned above, in spite of the fact that the number of registers in the data processor was increased by n times, the number of registers which can be used at a certain time point is still equal to that when this technique is not used, so that insufficiency of registers is unchanged.
In the above third kind of technique, it is necessary to significantly change the instruction set in order to enlarge the register specifier field in the instruction word. Consequently, there occurs a problem on a program compatibility such that the program formed for the processor of the conventional type cannot be executed.