Market demands are driving both an increase in bandwidth aggregation required for switch devices and a decrease in the multicast latency of such devices. Furthermore, in some implementations, such as High Frequency Trading (HFT) applications, switch devices may be required to handle multicast frames with minimal latency.
Switch devices may generally handle incoming frames using a store and forward switching technique or a cut-through switching technique. In the store and forward switching technique, an entire frame is received and stored by the switch before the frame is forwarded to its next destination, or destinations in the instance of a multicast frame. Since the switch receives and stores the entire frame prior to forwarding the frame, the switch can process the entire contents of the frame, e.g. the switch can verify the integrity of the frame. Thus, the store and forward switching technique may increase the reliability of the transmitted frames; however, the store and forward technique may also incur latency since the switch must wait until the entire frame is received before starting to forward the frame. In the cut-through switching technique, a switch may start forwarding a frame prior to receiving the entire frame. Since the switch starts to forward the frame before the entire frame is received, the switch may not be to verify the integrity of the frame. Thus, the cut-through switching technique may result in the propagation of invalid frames; however, the cut through switching technique may incur minimal latency relative to the store and forward switching technique since the switch can start forwarding the frame before the entire frame is received.
Accordingly, the store and forward switching technique may be suitable for some multicast traffic, such as multicast traffic for which data integrity is prioritized over latency, while the cut-through switching technique may be suitable for other multicast traffic, such as multicast traffic for which latency is prioritized over data integrity.
Some switch designs, such as Output Queued (OQ) switches, are often employed using a shared memory architecture, where all incoming frames are written into, and read out of, a single shared memory. Thus, an OQ switch with N input (or ingress) ports and N output (or egress) ports that utilizes shared memory must read N packets from the shared memory, and write N packets into the shared memory, in one packet arrival time. As such, memory throughput limitations may, in some instances, constructively limit the amount of bandwidth aggregation that can be achieved by an OQ switch. Thus, in some instances OQ switches may not be scalable to accommodate very high bandwidth aggregation.
Other switch designs, such as Combined Input and Output Queued (CIOQ) switches, utilize a separate buffer for each input and output port, along with a memory access arbitration module to arbitrate read/writes to/from each of the individual buffers. Since CIOQ switches utilize multiple independent buffers for each input and output port, instead of a single shared buffer for all of the input and output ports, a CIOQ switch may not be as limited by memory throughput limitations as an OQ switch, and therefore may be able to provide high bandwidth allocation in some instances. However, since a multicast frame is queued twice in a CIOQ switch, e.g. in a queue corresponding to the input port and in a queue corresponding to the output port, a CIOQ switch may incur considerable queue latency in some instances. Furthermore, in order for a CIOQ switch to implement the cut-through switching technique, the memory access arbitration module would need to be bypassed, which may, in some instances, result oversubscription at the output port, e.g. multiple multicast frames may arrive at the same output port simultaneously.