An interconnection network, such as an interconnection network used to switch data cells or data packets in a high-performance computing system (HPC) comprises, compute nodes interconnected by crossbar switches. A crossbar switch typically comprises a crossbar fabric with N input ports and M output ports. If N=M this is called a symmetric crossbar fabric. A more general case of a crossbar fabric is an asymmetric crossbar fabric with N input ports and M output ports, wherein N is unequal to M. Such an asymmetric crossbar fabric is part of a so-called N×M switch. If C is the physical co-location or clustering factor, then the asymmetric crossbar fabric has M/C, which typically is equal to N, cluster ports, where C ports are clustered or co-located on the same physical port-card for performance and reduced contention. Each of the C ports on a clustered port-card can receive data cells from C out of N input ports in the same time-step or cycle. A control unit, also called arbiter or scheduler, that is also part of the crossbar switch, provides that this happens. An asymmetric crossbar fabric becomes a symmetric crossbar fabric when N is equal to M, i.e. N=M.
In a crossbar switch, port cards transmit data cells to other port cards. Data cell headers carry sequence numbers for reliability and ordering. A sending port card transmits data cells to a receiving port card and holds data cells that are unacknowledged. If a receiving port card receives a data cell without an error, an acknowledgement (ACK) is routed to the sending port card. When this acknowledgement is received by the sending port card, the unacknowledged data cell is released from the sending port card's memory. If a data cell is received in error at the receiving port card and so communicated to the sending port card, the data cell may be retransmitted by the sending port card. If a time-out expires at the sending port card before the corresponding acknowledgement is received at the sending port card, the sending port card may also retransmit data cells to the receiving port card.
A crossbar switch 1 according to the state of the art is shown in FIG. 1. The crossbar switch 1 comprises an asymmetric crossbar fabric 2 with two sending port cards 3, 4 and one receiving port card 5 connected via said crossbar fabric 2. The crossbar switch 1 further comprises a control unit (arbiter) 6 connected with the sending port cards 3, 4 via discrete control channel links 7. The control unit 6 is clocking the sending ports connected with the sending port cards 3, 4 and the receiving ports connected with the receiving port card 5 and further controls which port cards 3, 4, 5 exchange data cells with each other. Links 9 to and from the crossbar fabric 2 are called data channel links, while links 7 between the control unit 6 and the port cards 3, 4, 5 are called control channel links. Within the crossbar switch 1, the sending port cards 3 and 4 can send data cells to the receiving port card 5 across the crossbar fabric 2. Thereby the receiving port card 5 can receive a data cell from the sending port card 3 as well as from the sending port card 4 in the same cycle. The receiving port card 5 has only one transmitter connected via a transmitter link 8 with the crossbar fabric 2. If the receiving port card 5 wants to acknowledge data cells from the sending port card 3 as well as from the sending port card 4, then it transmits acknowledgements to the sending port cards 3 and 4 in succession in subsequent cycles. These acknowledgements also occupy bandwidth within the crossbar fabric 2. The bandwidth provided for data cells within the crossbar fabric 2 is reduced by the acknowledgements. It is also customary to cumulatively acknowledge the receipt of data cells from the sending port cards 3, 4. In this case, valuable buffer space is needed at the sending port cards 3,4 to hold unacknowledged data cells until the cumulative acknowledgement arrives. For interconnection networks that have many-to-one traffic patterns, the receiving port card 5 has to wait for a data cell to piggyback the cumulative acknowledgement. This can further aggravate buffering needs at the sending ports. Crossbar fabric 2 of FIG. 1 can be optical with arbiter 6 providing electrical or optical control. Similarly, crossbar fabric 2 of FIG. 1 can be electrical and arbiter 6 providing electrical control.
Using the state of the art's crossbar switches, there are successive acknowledgement transmissions in the case of asymmetric crossbar fabrics with a co-location or clustering factor C, wherein C successive transmissions are needed to reach a sending port card assuming that C sending port cards transmit data cells within the same cycle to a clustered or co-located port on the same physical receiving port-card. Increased buffer space at the sending port cards is also required. In addition, data cell piggybacked cumulative acknowledgements require additional buffer space at the sender as the sender must wait for the cumulative acknowledgement piggybacked on a data cell from the receiver to the sender.
Daily et al, “The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers”, first International Workshop on Parallel Computing Routing and Communications, Seattle, Wash., May 1994, describes a reliability mechanism using multiple data packet copies and a unique token. Two copies of each data packet and a token are required that transfer from switch to switch. This removes the need for source buffering at the input compute node and removes acknowledgements from the data channel, but comes at the expense of higher data channel bandwidth due to multiple data copies and logic to remove duplicates at the destination node. This work is relevant for mesh-based networks and is inefficient for arbitrary topology packet networks.
In the Galles article entitled, “Spider: A High-Speed Network Interconnect,” IEEE Micro, Vol. 17, No. 1, pp. 34-39, January/February, 1997, an interconnection network using an electrical switch to switch data cells/packets is known. Intra-switch packet communication reliability uses a Cyclic Redundancy Check (CRC) to be checked on packet arrival and again at packet egress. Packets at egress that have failed CRC are stamped with a code, yet sent out on the link consuming precious bandwidth as no intra-switch ACKs are supported. The switch uses link-level retransmissions to ensure reliability for inter-switch communication. A sender port uses a go-back-n retransmission policy with ACKs being carried on the data channel links between switches. The switch uses link-to-link retransmission rather than end-to-end retransmission (from source compute node to destination compute node) to allow packet header modifications enroute to the destination compute node. Such modifications enroute to the destination are critical for superior traffic management cognizant of current network loading conditions, adaptive routing and packet aging.
In summary within a crossbar switch according to the state of the art two main drawbacks arise. First, one receiving port card can receive C data cells of multiple sending port cards during one cycle but due to its connection with only one clustered port of the crossbar fabric, it needs C cycles to acknowledge the C data cells. A main shortcoming of this is that the sending port cards may have to wait for up to C transmissions from the receiver port card to release the transmitted cell. Note that each cell can be transmitted only after arbitration successfully completes. A second drawback is that if acknowledgements from a receiving port card to a sending port card are carried on the data channel, this leads to a reduced bandwidth provided to data cell transmission, if the data channel bandwidth is fixed, or leads to increased data channel bandwidth to provide a constant bandwidth for data cell transmission. Moreover, if data cells are cumulatively acknowledged (i.e. each data cell is not acknowledged on a per cell basis but rather grouped over a range of sequence numbers) then increased buffer space is indicated at the sender to wait for the cumulative acknowledgement.
It is thus an object of the invention to provide another method to operate a crossbar switch. It is a further object of the invention to provide a method to operate a crossbar switch comprising a crossbar fabric with N sending and M receiving ports with port cards housing each sending and receiving port, which method provides increased bandwidth for transmitting data cells via said crossbar fabric and which method allows to acknowledge data cells received within the same cycle from different sending ports, in order to reduce buffer space at the sending port cards. It is further an object of the invention to provide a crossbar switch with an increased bandwidth for data cell transmission.