The present invention relates to a method of operating a buffered crossbar switch.
Crossbar switches connect a plurality of input/output ports one to another. Packets arriving at one of the input ports are routed to specified output ports. The most common routing operations—also referred to as connections in the further description—within a crossbar switch comprise routing an incoming packet to just one output port (unicast) and routing an incoming packet to all output ports (broadcast). There is a third routing operation called multicast, which comprises routing an incoming packet to multiple, specified output ports.
State of the art crossbar switches can handle multiple concurrent connections. These crossbar switches require crossbar buffers for temporarily storing blocked packets, i.e. packets that cannot be routed to the specified output port instantly. For example, such a problem occurs if multiple connections require a specific output port at the same time.
There are various design approaches for crossbar buffer structures aimed at non-blocking crossbar switches. One approach is the so-called input queuing, a variant of which is providing crossbar buffers at each input port. Another approach is output queuing, wherein packets destined for the same output port are stored in a crossbar buffer dedicated to said output port.
In the further description, a buffered crossbar switch architecture comprising dedicated crossbar buffers for each connection of an input port to an output port is referred to. This architecture can be interpreted as a combination of said input/output queuing techniques and serves well for explaining general crossbar operation with respect to the present invention.
FIG. 1 shows a section of a buffered crossbar switch based on the above mentioned architecture, comprising 4 input ports, i1, i2, i3, i4 and four output ports o1, o2, o3, o4.
Each input port is connected to a corresponding input crossbar 7, whereas each output port is connected to a corresponding output crossbar 8.
An intersection of an input crossbar 7 with an output crossbar 8 is called crosspoint and is characterized by a dedicated crossbar buffer 5, 6 consisting of a buffer control 5 and a buffer memory 6.
Buffered crossbar switches are frequently used in multi-processor computer systems with distributed memory architecture e.g. for linking processors with resources such as cache memory and other subsystems, or internet switch networks and similar high-performance communication networks.
The bit rate demands of these applications cannot be satisfied with a single buffered crossbar switch in many cases. Prior art has attempted to solve this problem by providing scalable buffered crossbar switches that can advantageously be combined to form an expanded crossbar switch. This mode of operation is referred to as expansion mode.
An expanded crossbar switch is characterised by a port-width that is a multiple of the port-width of a single scalable buffered crossbar switch. For example, an expanded crossbar switch consisting of four single scalable buffered crossbar switches each having a port width of 8 bit can handle 32 bit-data at each port thus quadrupling throughput as compared to a single scalable buffered crossbar switch.
Internal operation of an expanded crossbar switch is such that incoming packets are divided into smaller portions each of which is processed by one scalable buffered crossbar switch.
To ensure proper synchronization of scalable buffered crossbar switches in expansion mode, one of them is determined to be the master switch. Said master switch is connected to the remaining switches, the slave switches, via an external bus system.
There are various constraints for the amount of data to be transmitted over said external bus system in state of the art expanded crossbar switches, the most important being the high bit rate.
Nevertheless, since only said master switch receives the header portion of an incoming packet containing routing information, it is necessary to provide said slave switches with said routing information so that incoming packets can correctly be assigned to the corresponding output port(s)/buffer(s) in the slave switches, too.
The prior art approach to reduce the amount of data to be transmitted to said slave switches via said external bus system consists in reducing said routing information that is passed over from said master switch to said slave switches.
Switches that do not provide multicast operations can accomplish said reduction by binary coding of the respective output port/buffer in the case of unicasts and by providing just one additional status bit for indicating broadcasts.
However, for indicating multiple, specified output ports as required with multicasts, a bit-mask type representation of the output ports is necessary. For example, a bit-mask for a switch having 32 output ports comprises 32 bits. If a certain port is selected for output, the corresponding bit has to be set, otherwise, it has to be cleared. Hence, with multicasts, a wide variety of bit patterns is possible within the bit-mask.
Since said bit-mask is inevitable for multicast operations, it is also used for specifying unicast and broadcast operations in switches providing multicasts.
Thus, in order to reduce the amount of data of the bit-mask, that has to be transmitted from the master to the slaves, with unicast operations, an incoming packet is routed in a so-called auxiliary broadcast go each crossbar buffer connected to the respective input port by an input crossbar, i.e. after said auxiliary broadcast, said incoming packet is available at each output port. However, only one of these crossbar buffers is connected to the output port designated as destination for said unicast.
Compared to a unicast operation which routes an incoming packet directly to the crossbar buffer connected to the desired output port, said auxiliary broadcast operation saves the portion of address information to pick one crossbar buffer out of the plurality of crossbar buffers connected to the respective input crossbar. For instance, a single status bit is sufficient for indicating said auxiliary broadcast.
The disadvantage of this method is the high power dissipation in a crossbar switch due to the high number of write operations necessary for the auxiliary broadcast.
For a unicast operation in a crossbar switch with N output ports, said state of the art method employing auxiliary broadcasts requires writing data to N crossbar buffers whereas only writing to one crossbar buffer would be necessary for completing said unicast operation.
Apart from that, the crossbar buffers are not used efficiently because for each auxiliary broadcast N crossbar buffers are loaded with packet data only one of which is intended to send said packet data to the corresponding output port.
With each auxiliary broadcast, N−1 crossbar buffers are filled by said auxiliary broadcast that could otherwise be used by following connections thus contributing to overall performance.
However, this only holds true for unicasts. If an incoming data packet is to be broadcast to all of the N output ports of a switch, N crossbar buffers must be used anyway.
Since practice has proven that overall packet traffic mainly consists of unicasts, the poor resource usage due to said auxiliary broadcast is yet an important issue.