1. Field of the Invention
The present invention relates to cross-point switching networks and, in particular, to high-speed crosspoint switching systems for interconnecting high performance processors.
2. Description of the Prior Art
Multiprocessing has been recognized as a means of achieving computing speeds beyond what is possible with improvements in technology. One possible implementation of such a multiprocessor system is shown in FIG. 1. The system consists of N processors 10, each having its own memory. Some of the processors 10 are application processors which execute application programs for users, while some perform special functions such as input/output and system management. For high availability, the system is usually designed to tolerate the failure of one or more processors. The processors 10 are high performance computers, operating at 50 to 500 MIPs. The processors 10 in the multiprocessor system communicate with each other by sending messages via link adapters 12 over high speed fiber links 14 through a common switching system 16. The number n of links 14 between each processor 10 and the switch 16 depends on the desired communication bandwidth. Spare links, not shown in FIG. 1, can be provided for high availability.
The switch 16 provides each processor 10 the capability of sending messages to any other processor in the system by setting up connections dynamically. Such connections are established and terminated under certain protocols. Two of the most well known protocols are circuit-switching and message-switching. In circuit-switching, the sender first sends to the switch 16 a control message containing the address of the intended destination. The switch 16 then sets up a communication path between the two processors 10 and informs the sender. The sender then transmits the message and the connection is broken upon an acknowledgement from the destination that the data was received correctly. Under the message-switching protocol, the communication path is not established before sending the data. The message, which contains the address of the destination, is sent by the sender to the switch 16. The switch 16, upon receipt of the message, tries to set up a path to the destination and send the message. If successful, the message is sent to the destination and the connection is broken immediately after the end of transmission of the message. If the destination receives the message properly, it sends an acknowledgement to the sender through the switch 16 as a separate message. Since no communication path is already available when the message is received by the switch 16, buffers are provided in the switch 16 to store the message while a communication path to the destination is being set up.
Each of the above protocols is suited to a specific environment. Circuit-switching is favorable for long messages because the overhead for setting up the path initially becomes insignificant as compared to the actual time for transmitting the message across the links to the destination. Handling such messages by message-switching would require a large amount of buffering in the switch 16 and is therefore expensive. For short messages, however, circuit-switching performs poorly because the overhead for setting up the path becomes significant in comparison to the time for transmitting the message. This overhead includes the time for propagation of the control information from the sender to the switch 16 and receipt of a reply back, i.e., one round trip propagation delay in the fiber links 14. The fiber links 14 in a large data processing complex can be hundreds of meters long. At the rate of approximately 5 nanosecond per meter, the total round-trip delay can be a few microseconds. Message-switching eliminates this overhead. At the same time, the cost of buffering a short message in the switch is not prohibitive.
In the multiprocessor system shown in FIG. 1, the communication between processors 10 consists of both short and long messages. Short messages are used for such purposes as synchronization of processors 10 cooperating on a common task. The length of these messages is usually not more that 256 bytes. Long messages are associated with movement of pages of data between processors 10 or between a processor 10 and a shared storage device. The size of a page can be 4 kilobytes or more. These two types of messages pose different demands on the switch 16. Long messages require high bandwidth in the switch 16 to achieve fast transfer of data. The time overhead to set up the switch 16 under a circuit switching protocol is less significant because this is small in comparison to the transfer time. Short messages require only a lower bandwidth but are more sensitive to the set-up time. Therefore, to be able to support both types of communication efficiently, the switch 16 should be able to provide high bandwidth for long messages and low set-up time for short messages.
It is known to implement the switching system as multiple switching planes, each plane handling one fiber link per processor. Such a system is described in U.S. Pat. No. 4,695,999 issued to G. Lebizay. In this system, each switching plane is organized as an independent crosspoint switching system with its associated control circuitry for set-up. Each of the n links from a processor connects to a distinct switching plane. Variable bandwidth is achieved by using as many links as required during a specific transmission.
A switching system requires certain control information with each message to route the message to the proper destination. This includes the address of the destination where the message is to be sent, and the type of connection to be set up. When multiple switching planes are employed in the switching system, there are two ways of conveying this control information to the switch. The first is to treat each of the switching planes independently and send control information on every link of the sender preceding the data on that link. Each of the switching planes receives the control information and configures itself independent of the rest of the planes. This is the approach followed in U.S. Pat. No. 4,695,999. Alternately, one of the switching planes can be designated as the control plane and used exclusively for sending the control information. The switching planes are no longer independent, but are controlled simultaneously. Only one of the switching planes receives the control information, which then sets up all of the switching planes. Data can be sent through all of the links once the set-up is complete.
The approach of using independent switching planes, as described in U.S. Pat. No. 4,695,999, has some major drawbacks when applied to high-speed switching, typically at 1 gigabits/second and beyond. Hardware is required in each switching plane for processing the control information needed to make a connection. The incoming data arriving at a switching plane from the link is usually in coded form. One such code is the 8/10 code described in U.S. Pat. No. 4,665,517 issued to A. Widmer. This type of coding provides a number, of advantages, such as error detection, DC-balance and allowance for special control characters. Decoding of this data must be performed before control information can be extracted from it. This decoding involves the generation of a clock signal from the incoming data as well as conversion of the serial bitstream into parallel data words. The hardware to provide these functions at gigabit speeds is very costly. Additionally, buffers must be provided on each incoming link to hold the data while a connection request is waiting to be processed. This buffering at high speeds is very expensive to provide. Finally, each plane requires an independent controller which must be operated in synchronism with the other controllers to achieve the same set of connections in each plane.