1. Field of the Invention
The present invention relates generally to a register content inheriting system in a multi-processor. More particularly, the invention relates to a multithread microprocessor executing a plurality of instructions simultaneously.
2. Description of the Related Art
As a technology for speeding-up a program, there has been proposed a system for performing a parallel processing through a thread by dividing the program into a plurality of threads. Adapting to such thread level parallel processing, study for the processors have been progressed. The thread level parallel processing system takes a method to improve a processing speed with improving use efficiency of an arithmetic unit by executing a plurality of threads simultaneously instead of parallel characteristics of the instruction unit.
Such thread level parallel processing can be classified to one no dependency between the threads with each other for some problems to be solved at all, one having low dependency and whereby having less problem in performance even when dependency is resolved by a software and one having high dependency and thus requiring execution aid of thread level parallel processing by hardware.
When there is no dependency between the threads or when dependency between threads is low and thread is large, gain by parallel processing may be higher than an overhead of thread management by a software. Therefore, a support in a hardware can be restricted to be minimum.
However, in certain problem to be solved, dependency can become high or thread per se becomes small, some hardware support becomes necessary.
Upon speeding up of fine thread, efficient thread generation and data transfer between the threads are inherent. For example, as one example of a parallel processing multi-processor of fine threads has been disclosed "Multiscalar Processor (Gurinder S. Sohi, Scott E. Breach and T. N. Vijaykumar, The 22ns International Symposium on Computer Architecture, IEEE Computer Society Press, 1995, pp 414-425.
In Multiscalar Processor, a single program is divided into "tasks" as aggregate of basic blocks, and the "tasks" are processed by a processor which can executes those tasks in parallel. Transfer of register contents between "tasks" is designated by a task descriptor generated by a task compiler.
In the task descriptor, a register which may be generated is explicitly designated. This designation is referred to as create mask. On the other hand, for an instruction updating the register finally designated by the create mask, a forward bit is added. Thus, multiscalar processor performs parallel execution by a code depending upon decoding ability of the compiler.
One example of a construction of the multiscalar processor is shown in FIG. 24. In FIG. 24, the multiscalar processor is constructed with a sequencer 6, processing units 7-1 to 7-3, an associative network 8 and data banks 9-1 to 9-3.
Each of a plurality of the processing units 7-1 to 7-3 in the system is constructed with a cache 71, an execution unit 72 and a register file. On the other hand, corresponding to the processing units 7-1 to 7-3, a plurality of data banks 9-1 to 9-3 are provided. Each of the data banks 9-1 to 9-3 is constructed with an address resolution buffer (ARB) and data cache 91.
Management of simultaneous execution of a plurality of tasks is performed by the sequencer 6 which assigns task to the processing units 7-1 to 7-3. The content of each register of the register file is forwarded at a timing of data generation by designation of task descriptor.
On the other hand, in "Proposal for Directivity Control Parallel Architecture of On-chip Multiprocessor (MUSCAT)", (Torii, Kondo, Motomura, Konagaya, Nishi, JSPP 97, pp 229 to 236, May 1997), there has been proposed a fork one time model limiting the fork for only one time during a thread life period is a period, in which one thread generates a thread by a fork instruction, and a thread execution model, performing lamp inheriting of all registers of the register file upon thread generation.
An image of the fork one time model is shown in FIG. 23. The fork one time model generates new thread for only one time during life period of the threads #1 to #3. By introduction of this model, simplification of thread management can be realized.
Furthermore, in a technology disclosed in Japanese Unexamined Patent Publication No. 10-078880, several kinds of methods for realizing register inheriting method by the fork one time model has been disclosed. Among these inheriting method, most of the method employs a method to finally copy the register content while timings are different. However, copying of the register content causes increasing of physical amount and hindering of speeding up.
Therefore, in the technology disclosed in the above-identified Japanese Unexamined Patent Publication No. 10-078880, there has been proposed an example, realizing inheriting of the register content by providing common registers with separating the register into logical registers and physical registers and only mapping image indicative of relationship between the logical register and the physical register is copied, as out-to-order issuing system, in which instructions are issues in non-order irrespective of the program order.
An example of the construction of the processor of this type is shown in FIG. 25. In FIG. 25, there is shown a construction of a two thread parallel execution type processor which is constructed with a common physical register file 126 common to thread execution units 121a and 121b, a register busy table 129, a register free table 130 and a thread management unit 131.
Each of the thread execution units 121a and 121b is constructed with instruction caches 122a and 122b, instruction decoders 123a and 123b, register mapping tables 124a and 124b, instruction queues 125a and 125b, arithmetic units 127a and 127b and effective instruction order buffers 128a and 128b.
In the shown processor, the register is separated into a logical register to be accessed from the software and a physical register holding a register content in hardware, and a mapping relationship is held in the register mapping tables 124a and 124b.
Detailed construction of the register mapping tables 124a and 124b is shown in FIG. 26. In FIG. 26, the register mapping tables 124a and 124b has a physical register number entry of registers 0 to 31 to convert into register numbers "45", "13", "04", "21", -, "53".
Upon generation of the thread, by copying the mapping information between the register mapping tables 124a and 124b, register inheriting is realized without performing copy of the register content.
In the foregoing conventional multithread microprocessor, in case of the in-order issuing type in the register inheriting system of the register, in the above-mentioned publication, it becomes necessary to copy the content of the register upon initiation of the thread and termination of the thread.
On the other hand, in case of the out-of-order issuing type, copying of the register content becomes unnecessary. However, a common register free table between the thread execution units indicative of use/non-use of the register becomes necessary to cause a problem of complication of logic and data path and increasing of data amount. On the other hand, register renaming per one instruction is required to be too wasteful in application for the in-order issuing type.