The invention relates to a processor having an architecture of a super scalar and its processing method. More particularly, the invention relates to a processor for supporting a multi-context in which a plurality of contexts are simultaneously allowed to flow in a pipeline and are executed and its processing method.
First, a multi-context which is used in the invention will be described. First, the context is defined as "an execution unit on a computer to execute a certain group of works having a certain meaning". The context includes processes of different address spaces of the UNIX and a thread of the same address space in the MACH. Namely, a plurality of contexts don't care about a problem about whether they have independent memory spaces or the same memory space but incorporate both of those cases. The multi-context, therefore, is a set of a plurality of contexts which can be simultaneously executed. A processor to support the multi-context of the invention intends to efficiently input a plurality of contexts to a pipeline and to execute them.
Hitherto, codes which were compiled to a single instruction train from a single user program are generally inputted to a pipeline of the super scalar processor and, upon execution, a parallel performance is extracted by a hardware and the instructions are executed in parallel. According to this method, however, in the case where there is a dependency relation between the instructions, the case where a mishit occurs in a cache, the case where a branch instruction is executed, or the like, the execution is interrupted and a vacancy occurs in the pipeline. Such a phenomenon is called a stall of the pipeline or the occurrence of a bubble of the pipeline. Therefore, the performance which the processor inherently has cannot be fully used. Accordingly, in order to solve the stall of the pipeline, there is a method whereby an instruction train of a plurality of processes is inputted to the pipeline at a certain predetermined interval and the stalls of the pipeline which are caused from the mutual instruction trains are set off. According to this method, however, since a switching interval of the instruction train is constant, a scheduling using a nature of the program cannot be performed and an effect of the elimination of the stall is small. To simultaneously execute a plurality of contexts, registers each for holding an execution environment of each context of the number as many as the number of contexts have to be provided, so that a large quantity of resources are consumed. On the contrary, in case of dividing certain predetermined resources to the contexts and of using them, there is a drawback such that the resources which can be used by one context are reduced.
An HEP will now be described as an example of the above method. As shown in FIG. 1, the HEP is a method whereby a plurality of instruction streams are mechanically inputted to a pipeline one instruction by one and its mechanism is simple. In FIG. 1, (n) programs are simultaneously inputted to one pipeline. Each of the programs (1 to n) denotes a context of the invention. Upon execution, the instructions are sequentially inputted to a pipeline 200 in accordance with the following order and are executed.
Instruction 1 of the program 1 PA1 Instruction 1 of the program 2 PA1 Instruction 1 of the program n PA1 Instruction 2 of the program 1 PA1 a forming step of forming a plurality of contexts as execution units to execute a certain group of works of a certain meaning; PA1 an instruction executing step of supplying an instruction of the context existing in a pipeline, executing such an instruction, and when a vacancy of the pipeline is judged, switching the context to another context which is being executed, and simultaneously executing a plurality of contexts; PA1 an ID setting step of setting a peculiar context ID to each of the plurality of contexts which are being simultaneously executed; and PA1 a register renaming step of renaming a name of the register which is used in the execution of the context to a register name in which a designation register name of an execution instruction was added to the context ID and allocating a physical register.
The pipeline 200 is constructed by four stages. A fetching stage (F) fetches an instruction from a cache. A decoding stage (D) decodes the instruction. An executing stage EX (Execute) executes the instruction. A storing stage (S) (Store) stores the execution result. Now, assuming that the instruction 1 of the program 1 is a load instruction and an instruction 2 of the program 1 uses the result loaded by the instruction 1, in the case where the instruction is executed by only the program 1, a stall of the pipeline occurs between the instructions 1 and 2. Generally, since the load instruction accesses a memory, the loading result cannot be used in the cycle just after, so that a stall of the pipeline occurs. In case of the HEP of FIG. 1, since the execution of the instruction 1 of the programs 1 to (n) is sandwiched between the execution of the instruction 1 of the program 1 and the execution of the instruction 2 of the program 1. Therefore, the stall of the pipeline 200 in the case where the instruction 1 of the program 1 is a load instruction is effectively eliminated and a throughput of the execution of the program can be raised. However, when the instructions are inputted to the pipeline 200, since the relation among the instruction streams of the plurality of programs 1 to (n) is not considered, there is no guarantee such that the occurrence of the stall of the pipeline 200 can be always reduced. Further, the work registers of only the number corresponding to the number of contexts (programs) which simultaneously operate upon execution must be prepared and a request for the number of registers increases. On the contrary, when the number of registers which can be prepared by the hardware is fixed, the number of registers which can use one context decreases. In the example of the HEP of FIG. 1, the register sets of the number as many as the (n) programs must be simply prepared at a time. Further, according to such a simple inputting method of the instruction stream, there is a large possibility such that hit ratios of an instruction cache and a data cache deteriorate. On the other hand, as another method of solving the stall of the pipeline, there is also a method whereby program units which can be executed in parallel called threads are extracted from a single program and they are simultaneously inputted to the pipeline and are executed. In many cases, however, such a method can be applied to only the simple case such that a loop of a numerical value calculating program written by the FORTRAN or the like is set to one thread. It is very difficult to extract the threads by a compiler for a general program.