In association with increase in the scale of data processing in recent years, demand for high performance data processing system is increasing. To realize high performance in data processing, it is needless to say that performance of a CPU must be improved, but, also as improvement in performance of data transfer between a CPU and a main memory, especially between those in different data processing modules largely contributes to improvement in performance of the entire system, so that there is a strong demand for development of a means for improving performance in data transfer with low cost.
FIG. 23 is a block diagram showing a data processing apparatus based on the conventional technology. In FIG. 23, the reference numerals 651, 652 indicate a common bus respectively, and the reference numerals 610, 620, 630, and 640 indicate a processing module (hereafter only module) respectively. The modules 610, 620, 630, and 640 are connected in parallel to and share common buses 651, 652 for processing and transferring data.
The module 610 has an I/O connector 611 connected to the common bus 651 and an I/O connector 612 connected to the common bus 652 and connects the modules to a back plane. This module 610 further comprises a common bus control circuit 613 for controlling the common buses 651 and 652, a CPU 614 for providing controls over, for instance, data processing in the module itself, and a shared memory 615 allowing access not only from CPU in the module but also from CPUs of other modules.
The module 620 has an I/O connector 621 connected to the common bus 651 and an I/O connector 622 connected to the common bus 652 and connects the modules to a back plane. This module 620 further comprises a common bus control circuit 623 for controlling the common buses 651 and 652, a CPU 624 for providing controls over, for instance, data processing in the module itself, and a shared memory 625 allowing access not only from CPU in the module but also from CPUs of other modules.
The module 630 has an I/O connector 631 connected to a common bus 651 and an I/O connector 632 connected to a common bus 652, and connects the modules to a back plane. This module 630 further comprises a common bus control circuit 633 for controlling the common buses 651 and 652, a CPU 634 for providing controls over, for instance, data processing in the module itself, and a shared memory 635 allowing access not only from CPU in the module but also from CPUs of other modules.
The module 640 has an I/O connector 641 connected to a common bus 651 and an I/O connector 642 connected to a common bus 652, and connects the modules to a back plane. This module 640 further comprises a common bus control circuit 643 for controlling the common buses 651 and 652, a CPU 644 for providing controls over, for instance, data processing in the module itself, and a shared memory 645 allowing access not only from a CPU in the module but also from CPUs of other modules.
Each of the modules 610 to 640 access a shared memory in the module or in any other module using the common bus 651 or 652. As for the access right to the common bus, a common bus arbitrating circuit provide controls, and only one module can occupy a common bus at one time.
Next, description is made for operations. FIG. 24 and FIG. 25 are time charts for describing the timing of the data transfer in the data processing apparatus shown in FIG. 23. Herein, description is made for an operation timing in a case of estimating a time required for each of the CPUs 614, 624, 634, 644 to access all of other shared memories in all of the modules 610, 620, 630, and 640 with reference to FIG. 23 and FIG. 24.
In FIG. 24 and FIG. 25, P0 to P3 indicates the CPUs 614, 624, 634, and 644 with a serial numbers of P, while M0 to M3 indicates the shared memories 615, 625, 636 and 645 with serial a numbers of M respectively. .tau. indicates one cycle (e.g., 12 ns), while DC indicates a dummy cycle.
In FIG. 24 and FIG. 25, to execute all the accesses of (1) P0 to M1, (2) P1 to M2, (3) P2 to M3, (4) P3 to M0, (5) P0 to M2, (6) P1 to M3, (7) P2 to M0, (8) P3 to M1, (9) P0 to M3, (10) P1 to M0, (11) P2 to M1, and (12) P3 to M2, the access (1), (3), (5), (7) and (11) is executed in the common bus 651, and the accesses (2), (4), (6), (8) and (12) is executed in the common bus 652.
The entire cycle including all of these accesses can be estimated as follows. Namely, assuming that the common buses 651 and 652 are 16-byte buses respectively, and that 64-byte data is transferred by accessing a memory once, 4.tau. (4 cycles) is occupied by either one of the common buses 651 or 652, and 16 bytes.times.4=64 byte-data is transferred.
It should be noted that any of the common buses 651 and 652 constitute a two-directional common bus, so that it is necessary to insert an empty cycle of 1.tau. between each operation for transfer to prevent bus fight.
Assuming what was described above, a cycle required for executing the processing for accesses (1) to (12) is, as shown in the time charts in FIG. 24 and FIG. 25, 30.tau. from .tau.1 to .tau.30. In each of the common buses 651 and 652, 6 dummy cycles are inserted during this 30.tau.. Accordingly, assuming that 1.tau. is equal to 12 ns, the data transfer capacity of this system as a whole is computed as follows: EQU 64 bytes.times.12 times/(12 ns.times.30.tau.)=2.1 GB/s
In the data processing apparatus as described above, however, the common buses 651 and 652 are based on a parallel bus structure, so that any of the buses is connected to all modules, and at the same time the total bus line length includes a branch length in each module. For this reason, the total line length is long and time required for signal propagation therethrough becomes longer, and in addition as the common buses 651 and 652 are two-directional common buses, so that it is impossible to omit an empty cycle of 1.tau. to be inserted between each transfer for prevention of bus fight when a transfer direction is switched.
For the reasons as described above, it is impossible to further reduce the minimum number of cycles required for data transfer, and also it is extremely difficult to further shorten a time required for one cycle. As a result, a frequency of memory access/volume of transferred data increases with an increase in the performance of the CPU, so that performance of a CPU itself has been improved, but performance of the system as a whole has not been improved.