This invention relates to a device for data transfer in a data processor, and particularly relates to a multiprocessor system using a plurality of data transfer devices.
Recently, a multiprocessor system composed of a plurality of processors is focussed on in order to enhance processability of a computer. The multiprocessor system is a computer in which a plurality of processor elements each having a processor and a data transfer device are connected one another by an interconnection network. Various multiprocessor systems have been disclosed such as in H. Kadota et al., "VLSI Parallel Computer with Data Transfer Network: ADENA," Proc. of 1989 International Conference on Parallel Processing, August 1989, pp. I-319-22.
In the multiprocessor system, high-speed processing is contemplated in such a manner that sequential data managed by one processor are distributed and arranged to the plural processor elements to execute calculation processing in parallel form. A host processor executes management of an initial data regarding sequential processing and calculation and management of input/output data. In case where the system executes calculation of a parallel event, the host processor preforms data distribution and arrangement to the plural processor elements. When the host processor executes sequential calculation, using data of parallel calculation result, the plural processor elements are required to perform data collection.
For example, this corresponds to a case where calculation of array data is executed in the order of calculation formulae (1), (2) and (3) as follows: EQU b(i, j, k)=a(i, j, k)+2.5 (1) EQU sum=sum+b(i, j, k)*c(i, j, k) (2) EQU d(i, j, k)=d(i, j, k)*sum (3)
Wherein, in each calculation formula, 1.ltoreq.i.ltoreq.max, 1.ltoreq.j.ltoreq.jmax, and 1.ltoreq.k.ltoreq.kmax.
Normally, array data a, b, c, d which are to be used in the calculation of the calculation formulae (1)-(3) are managed in a memory of the host processor. Each calculation formula (1), (3) can be executed in parallel. At execution of the calculation formula (1), the host processor distributes and arranges the data to the plural processor elements to make each processor element execute parallel calculation regarding the calculation formula (1). Since the calculation formula (2) is for executing sequential calculation, using the result of the parallel calculation, data collection to the host processor or one of the processor elements is required. Therefore, the data in the processor elements are collected to one of the processors which executes the sequential calculation. The calculation formula (3), which can be performed in parallel, is executed, distributing and arranging the data to each processor element, as well as in the case of the calculation formula (1).
According to one example of methods for assigning array data regarding calculation (by I. Okabayashi et al.; Network structure and VLSI implementation for a parallel computer: ADENA, Technical Report of IEICE, ICD89-152, 1989), the array data are assigned to each processor element to which two eigen-recognition numbers are allotted, corresponding to subscripts in two directions out of subscripts in three directions of three-dimensional array data.
Referring to FIGS. 13-15, operation of distribution, arrangement and collection of data in a conventional multiprocessor system.
FIG. 13 shows a construction of a conventional multiprocessor system. In the figure, reference numeral 900 indicates a host processor. 910 is a processor element. 920-1-920-4 are processor element groups respectively composed of the processor elements in a set number. 930-1-930-4 are sub-processors respectively for selecting one of the processor element groups 920-1-920-4 according to a direction from the host processor 900. 940 is an exchange control circuit for connecting an internal switch of one of the sub-processors 930-1-930-4 to one of the processor element groups 920-1-920-4 according to the direction from the host processor 900. 50 is a broadcast bus. 51-1-51-4 are sub-broadcast buses of the respective processor element groups 920-1-920-4.
Such a construction is disclosed as prior art in, for example, Laid Open unexamined Japanese Patent Application No. 61-139868 and "Parallel Processing Performance in a Linda System," by L. Borrmann, M. Herdieckerhoff, Proc. of 1989 International Conference on Parallel Processing, August 1989, pp. I-151-53.
According to the construction shown in FIG. 13, the respective processor elements 910 in the processor element groups 920-1-920-4 are connected to the respective sub-processors 930-1-930-4 via the respective sub-broadcast busses 51-1-51-4. The sub-processors 930-1-930-4 are connected to the host processor 900 via the broadcast bus 50. When data is transferred between the host processor 900 and a specific processor element 910, the host processor 900 directs the exchange control circuit 940 to connect the broadcast bus 50 to the sub-broadcast buses 51-1-51-4 by directing a specified one of the sub-processors 930-1-930-4. Further, the sub-processors 930-1-930-4 specifies a specific processor element 910 to perform data transfer.
FIG. 14 shows a format of data packet used in case where data are distributed, arranged and collected by packet transfer. In the figure, reference numeral 60 indicates a synchronization flag and 61 is a target address. The target address 61 is composed of a target processor element group address 62 and a target processor element address 63. 64 indicates data.
FIG. 15 shows a construction of a conventional multiprocessor system for data transfer using the data packet in FIG. 14. In FIG. 15, reference numeral 900 indicates the host processor. 951 is a memory of the host processor 900. 952 is a data transfer device of the host processor 900. 70 is an internal bus. 953 is data transmission control means for controlling data transmission. 954 is packet generation/addition means for generating packet at data transmission of the data transfer device 952. 955 is data receiving control means for controlling data receiving. 956 is packet recognition means for executing decomposition of received packet and command recognition. 957 is data classification means for classifying data by sequentially fetching a target address in the packet.
920-1-920-4 are processor element groups each composed of a plurality of processor elements 910. In each processor element 910, 961 is a memory, 962 is a data transfer device and 71 is an internal bus. 963 is data transmission control means for controlling data transmission of the processor element 910. 964 is packet generation/addition means for generating packet regarding data transmission. 965 is data receiving control means for controlling data receiving. 966 is packet recognition means for executing decomposition of received packet and command recognition.
Reference numeral 50 indicates a broadcast bus. 51-1 is a sub-broadcast bus in the processor element group 920-1. 930-1-930-4 are sub-processors respectively intervened between the host processor 900 and the processor element groups 920-1-920-4. 940 is the same exchange control circuit as in FIG. 13.
In FIG. 15, when the host processor 900 executes data distribution and arrangement to the processor elements 910 of the plural processor element groups 920-1-920-4, each processor element holds beforehand an eigen-recognition number PID of the processor element and an eigen-recognition number GID of the processor element group. The packet generation/addition means 954 in the data transfer device 952 of the host processor 900 generates, as shown in FIG. 14, a corresponding recognition number (target processor element group address 62 and target processor element address 63) as the target address 61 of packet and adds the generated one to the data 64 to execute data transmission by the data transmission control means 953.
In each processor element 910, the data receiving control means 965 receives data packet transmitted at once by the host processor 900, and the packet recognition means 966 judges whether the received data packet is provided to the own data receiving control means 965. When the judgement results in TRUE, the data is read by the data receiving control means 965 from the broadcast bus 50 through the sub-broadcast bus 51-1, and is written into the memory 961. When the judgement results in FALSE, data is not accepted.
Reversely, when the processor elements 910 transmit data to the host processor 900, it is impossible that the plural processor elements 910 generate data packet at a same time and transmit the data packet, because of data race caused. Therefore, the data receiving control means 955 of the host processor 900 specifies one of the sub-processors 930-1-930-4 (e.g. 930-1) via the exchange control circuit 940 to execute bus connection to the processor group 920-1 between the broadcast bus 50 and the sub-broadcast bus 51-1. In the processor element 910 under the bus connection, the packet generation/addition means 964 generates data packet and the data transmission control means 963 executes data transmission. In this case, a sequence of data storage is provided to the packet generated in the packet generation/addition means 964.
Such a method is disclosed in the above Japanese reference 61-139868 as another prior art technique, in which efficient data distribution, arrangement and collection are contemplated in a multiprocessor system by using data packet.
In general, as described above, there are chief two methods for executing data distribution, arrangement and collection in the conventional multiprocessor system: (1) a method that the broadcast buses are respectively provided; and (2) a packet control method.
Under the above construction, however, as cleared from FIG. 13, a control port and a switching mechanism are required for each processor, which involves complicated system. Further, one host processor 900 concentrates on management of the bus switching, with results that signal lines for switch control are increased in number and in length in proportion to increase in processors. This causes concentration of task regarding control at the processor for controlling the signal control lines. Otherwise, if a multi-port is provided to each processor, the ports are increased in number, which involves complication of port control and construction of the processors.
Moreover, in the packet method in FIG. 15, lengthy packet data must be transferred at every data transfer. In other words, extra data must be sent out to the broadcast bus 50. Especially, with data of short data length, overhead of packet data, i.e. overhead such as packet receiving, address matching, packet discard and the like is unnecessarily increased, with a result of lowered data transfer efficiency. This circumstance is not improved even in case where the processor element group with no relation to the data transfer is electrically disconnected, and all processors are required to execute control recognition by packet again after reconnection.
In data distribution and arrangement where the host processor 900 transmits data to the processor elements 910, judgement can be performed with only data packet. However, since data concentration occurs where the data in the processor elements 910 are collected to the host processor 900, the host processor 900 is required to specify the processor element 910 in a prescribed method and to execute data classification operation according to the received data packet. There may be a method that data is written into the memory 951 without using data classification means 957 for data classification operation. Anyway, hardware regarding the selection of the processor element groups 920-1-920-4 is required. Further, hardware for respective data transfer devices 952, 962 of the host processor 900 and the processor elements 910 must be increased and hardware for path selection in the broadcast bus 50 is also required, with a result of complicated hardware.