This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2000-356237, filed Nov. 22, 2000, the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to a multiprocessor system and a control method thereof. More specifically, the present invention relates to a multiprocessor system which transfers data/program between a global memory and a local memory of each processor element by the DMA transfer.
2. Description of the Related Art
Conventionally, a multiprocessor system has been well known as a method for speeding up computers. The multiprocessor system includes a tightly-coupled multiprocessor system employing a shared memory system that shares a memory and a loosely-coupled multiprocessor system having the memories distributed on the processors.
As the shared memory system can communicate between the processors through the shared memory, this involves a problem, although the programming is simple, such that a special mechanism in association with a shared memory access which enables respective processors to refer to the shared memory under a state that the matching of the data has been kept is needed and the hardware becomes complicated.
In the loosely-coupled multiprocessor system, the communication between the processors by the use of the shared memory is not carried out, so that this involve a problem such that it is necessary to provide a function for the communication between the processors by a program which controls the function for each processor and it is difficult to manufacture a program, although it is possible to simplify the hardware on this account.
In order to easily control the loosely-coupled multiprocessor system, a method is known such that a master processor to control respective processors is provided and this master processor transmits a command to other respective processor elements. Controlling an order of commands to be transmitted to other respective processor elements and timing thereof on the master processor enables the operation of the entire multiprocessor system to be easily controlled.
Each processor element has a local memory in the loosely-coupled multiprocessor system. Even in the loosely-coupled multiprocessor system, when there is a memory (a global memory) capable of being commonly used in each processor element, the programming is easier and it is also possible to miniaturize a memory size of each local memory.
However, it takes a long time to have access to the global memory because of the bus arbitration and other factors, compared with that of the local memory. If a processor resource has been appropriated for a long time in order to have access to the memory, throughput has been decreased.
In order to improve this, recently, a mechanism is projected such that the data/program are transferred between the global memory and the local memory of each processor element by the DMA transfer.
In this case, according to a program executed in the master processor, a procedure to control each processor element and a DMA controller is described. By describing this program in a multithread, it is possible to use a plurality of processor elements effectively.
However, a processing time of individual processor elements and a time for the DMA transfer are unforeknown, so that even in the case of controlling each processor element and the DMA controller in a multithread program to be executed on the master processor, it is difficult to effectively assign the processing operation corresponding to each thread to the corresponding processor element in fact. In order to decrease a time during the processor element has been vacant, the following two problems have to be solved.
A first problem is as follows. Since there is dependency in the processing of the DMA and the processor element, the master processor is used in order to control this dependency. However, the operation of the master processor is not effective, if the interruption in the master processor and the switching of the thread are carried out each time when the DMA and the processing of the processor element are terminated. Particularly, if the number of the processor elements to be controlled by the master processor is increased, the processes such as the interruption in the master processor and the switching of the thread are frequently performed, so that the processing efficiency has been decreased.
A second program is that, in the case of allowing a certain processor element to perform the operation in association with not less than two threads executed on the master processor, data which is DMA-transferred to the local memory by the control of a certain thread is used in the processing in association with another thread.
For example, it is considered that a thread A and a thread B are executed in parallel on the master processor and the processor element processes the data on the local memory by the control of these threads. In this case, depending on a relation between timing for switching the thread A and the thread B and a time for processing the DMA and the processor element, before the data for the thread A, which has been DMA-transferred from the global memory to the local memory, is processed in fact by the processing operation of the processor element in association with the thread A, the switching from the thread A to the thread B is executed, so that the data for the thread A is possibly used by the processing operation of the processor element in association with the thread B. Hereby, a defect has been occurred in the data to be treated.
The present invention has been made taking the present problems into consideration, an object of the invention to provide a multiprocessor system capable of decreasing a time during a processor element has been vacant so as to improve the throughput without a problem with respect to increasing a load of a master processor and unconformity of data to be treated and a control method thereof.
According to one aspect of the present invention, there is provided a multiprocessor system comprising: a master processor that issues commands; a plurality of processor-elements, each of which has a local memory and a first command pooling buffer, the first command pooling buffer pooling the commands issued from the master processor, wherein the processor-elements are controlled by the commands in the buffer; a global memory which is common to the master processor and the processor-elements; a transfer device having a second command pooling buffer, the transfer device being controlled by some of the commands issued from the master processor, to transfer a program/data between the local memory of the processor-elements and the global memory, wherein the commands are pooled in the second command pooling buffer; and a counter device to notify the master processor that the number of responses to the commands issued from the master processor, returned from the processor-elements and transfer device, has reached a predetermined number of responses, the predetermined number being pre-stored in the counter device.