1. Field
The present disclosure relates to the field of computers, shared storage architectures, and particularly relates to a close-coupling shared storage architecture of a double-wing expandable multiprocessor.
2. Description of the Related Art
A close-coupling shared storage architecture can be implemented in various ways. The common architectures comprise a Symmetrical Multi-Processor (SMP) architecture and a Non-Uniform Memory Access (NUMA) architecture. If the memories are physically placed to be concentrated, and the delays that any processor accesses the concentrated memories are equal, this architecture is referred to as SMP. Currently, most of the 2˜4-way Intel Xeon and Itanium systems are SMPs implemented by sharing the system bus. However, since there are limitations on the bus driving capability and the memory bandwidth, it is difficult to enlarge the scale of the SMP system, which is usually limited to a 2˜8-way architecture. In a computer system with the NUMA architecture, the memories are physically distributed, the delay that the processor accesses a local memory is small, and the delay that the processor accesses a remote memory is large. The interconnection manner of the NUMA architecture is usually a two-stage interconnection. In the first stage, 2˜4 CPUs are connected through a shared bus or in a direct point-to-point way, and a processor sub-system is constructed with such connected CPUs and node controllers. A bigger system is constructed by applying a customized or general interconnection network among the processor sub-systems.
For constructing a large-scale multiprocessor system, the NUMA architecture is usually required. In the design of such an architecture, there is a limitation on the increasing of the number of the ports of a cross switch route chip NR as the core of the interconnection network due to the problems of the technique and the process (currently, the maximum number of the ports is 16). On the other side, since the node controller NC is close to the processor in the physical space, and far from the interconnection network (cross switch route chip NR), the actual bandwidths on both sides of a single link are mismatched, that is, the actual bandwidth at the single link processor side is higher than the actual bandwidth at the interconnection network side.
On this premise, if the number of the processors in the system is doubled, the following two methods can be used. One method is: each node controller NC is connected to the processors via n links, and to the cross switch route chips NRs via other n links. On the basis of such connections, the close-coupling shared storage architecture with the double number of processors is constructed. Since the actual bandwidth at the single link processor side is higher than the actual bandwidth at the interconnection network side, the processor bandwidth and the network bandwidth on both sides of the node controllers NCs are mismatched, that is, this method pays the price of the mismatch between the processor bandwidth and the network bandwidth so as to obtain a low delay of the network communication.
Another method is: each node controller NC is connected to the processors via m links, and to the cross switch route chips NRs via other n links. In order to maintain the substantial match between the processor bandwidth and the network bandwidth, it is required that m<n. However, the number of the node controllers NCs would be increased obviously. Since there is a limitation on the number of the ports of a cross switch route chip NR (currently, the maximum number of the ports is 16), it is required that a interconnection network providing a larger number of ports is constructed through cascading cross switch route chips NRs, so that the network interconnection hop will be increased, that is, this method pays the price of the delay of the network communication so as to obtain the relative balance between the processor bandwidth and the network bandwidth.
When the scale of the processors is enlarged, constructing the multiprocessor close-coupling shared storage architecture by using the above methods always causes the problem of a mismatch between the processor bandwidth and the network bandwidth or the problem of increasing the average delay of the network. It seems that the bandwidth match and the interconnection hop cannot be achieved simultaneously. It is a problem that the one skilled in the art wants to solve whether or not other methods can maintain the match between the processor bandwidth and the network bandwidth and can decrease the average delay of the interconnection network when the scale of the processors is enlarged.