1. Field of the Invention
The present invention relates to a structure of processor having a plurality of main processors and sub processors, and a method for sharing the sub processors.
2. Description of the Related Art
In a conventional processor structure, a system for supporting the sub processor is mainly divided into a system having an external processor and a system having an internal processing unit. A processing using the external processor has disadvantages that even if the design for the processor is simple, an instruction of sub processor should be executed in external chip elements resulting in a deterioration of processing characteristics and the main processor is not processed in parallel with sub processor, resulting in a deterioration of main processor's performance in processing a program. Also, the system being realized by an internal processing unit has excellent processing performance, but it share the sub processor's function when a plurality of main-processors exist, because the sub processor could operate as a part of main processor processing unit inside the main processor.
According to the advance of semiconductor technology, as the number of circuits or transistors which can be accommodated by a processor element, increases, it becomes possible to integrate a plurality of main processors into a processor and also integrate several complex functions into a processor. Processors which are used widely at present build-in a small cache memory and one or two sub processors as well as several main processor processing units.
When there are several such main processor processing units, it is possible to maximize their performance by executing the processing units, simultaneously. It can be shown as a grade of an instruction-level parallelism. In order to improve the grade of the instruction-level parallelism, methods have been suggested for removing a processing dependency between a plurality of processing units or solving the dependency in effect. In this case, sub processors such as a floating-point arithmetic unit, graphic processors, and so on have characteristics other than these of main processor with respect to a processing time, a data format, a processing instruction format, and so on. Therefore, the sub processors are treated differently with several main processor processing units and thus the instruction-level parallelism becomes an important factor on processing performance of various application programs including a sub processor instruction.
When integrating a plurality of main processors into a processor, it becomes a severe waste factor for all main processors to have a sub processor whose frequency in use is less than that of a main processor and in particular, it has a big problem when the sub processor has several functions including a graphic processing function, communication processing function, and so on, except for a general floating-point arithmetic function.
That is, in such case, there is a disadvantage that it is not possible to access effectively various functions of the sub processor when a plurality of main processors exist.
In order to overcome this problem, there have been much researches and developments. There is a method for connecting one or more sub processors to a main processor via an external bus interface. This method has an effect in reducing a special signaling burden for the sub processor. However, the processor cannot perform other works when processing an instruction of the sub processor and it is not possible to maximize an instruction-level parallelism because of a deterioration of its performance caused by transferring data necessary for sub processor as an instruction.
Also, there is a method for processing and restoring an exceptional situation that occurs when processing a sub processor instruction, using a minimum of information. According to this method, position information to be returned through restoring after servicing the exceptional case is limited only and transferred to a queue position in which a program counter is stored. However, this method also has a problem that when restoring, a value of register cannot be restored. Also, the information, which will be stored, can be stored and restored just as many as a pipeline grade. It can only cope with an exceptional situation, which occurs before processing of the present instruction is completed and it is not possible to be restored when the next instruction starts processing. Therefore, there is a problem that the sub processor should be synchronized to the pipeline of main processor or main processor should be stopped while processing of the sub processor.
There is a method for using a partial decoding in order to shorten an instruction fetching time. This method reduces an instruction fetching delay time in the instruction flow that has a branch and is capable of improving the performance of sub processor by fetching the instruction in advance, so as to meet the branch. However, there is no effect in an usual situation, except the possibility that instruction for the sub processor can be fetched in advance, and it does not include a direct data path for the sub processor.
In order to execute independently an external input/output operating or a memory access operating independently, there has been suggested a method for adopting a decoupling system for an external interface. This method is to access an external device under multi-processor circumstance, but it does not provide a method for improving the its efficiency when a plurality of processors share an sub processor on a individual instruction level. Also, because the conventional concept of decoupling between internal execution units does not include a structure of multi processors and a concept of shared sub processor interface, it does not suggest any measure to meet an exceptional situation in sub processor or a method for scheduling an instruction of sub processor. In addition, a queue, which is applied to the above method, is satisfied with the usual first in first out system (FIFO), but it does not provide a structure which is capable of checking and scheduling a plurality of entries at a time.
There has been presented a method for using a sub processor for floating-point arithmetic unit. This method uses a technique, which may improve instruction-level parallelism between a main processor and a sub processor. However, this method neither includes a concept of decoupling by the use of queue nor teaches a concept of sharing a plurality of main processors by using a queue of a sub processor. In storing a data in an internal register, the method also tries to maintain a consistency of data by checking an inter-dependency between instructions, which are executed or ready to be executed. However, it does not include a method for allocating a plurality of register files, one per main processor, to sub processor. Further, the above method does not teach a technique that a data for a sub processor may bypass a small internal cache. The method is only capable of reducing a pipeline stall and improving its performance by using a cache. There is a method for connecting a main processor and a sub processor via an external bus. According to the method, the main processor should be in a waiting state, when a sub processor instruction is executed and when it should transfer data necessary for a sub processor via a bus in the same process as in an instruction, resulting in deterioration of an executing performance. Moreover, this method does not include a concept of an instruction-level parallelism. Therefore, the above method does not include any special measure to handle an exceptional situation occurring in a sub processor and functions only to inform its situation through a bus which is connected to a main processor. As a method for suggesting a structure of multi processors, there is a method for embedding one or more main processors into one processor and connecting a plurality of processors. However, this method also does not teach a technique that is capable of sharing effectively the sub processor.
As a method for applying a concept of decoupling, there is a method for positioning an instruction queue between instruction fetching units and an executing unit of main processor and sub processor. However, this method has no sharing function because it does not support a plurality of main processors. Accordingly, there is no function capable of processing simultaneously several instructions that are required by a plurality of main processors. In processing an exceptional situation, this method uses a temporary register file. However, because the number of the temporary register file could not be infinite, even though sub processor processing units are decoupled, the scope of the execution time with in which the processor can handle and recover from an exceptional case should be greatly limited. Otherwise, it causes a problem that it is not possible to restore the processor status under of exceptional circumstances or that other functions of the processor should be stalled until there is a room in that temporary register file.