A. Field of the Invention
The invention relates generally to backplane interconnect switching architectures, and more particularly it relates to an arbitrated crosspoint switching architecture for use within a network switch.
B. Description of the Related Art
Conventional modular backplane interconnect architectures can be classified into three separate groups: 1) shared memory, 2) crosspoint buffered, and 3) arbitrated crosspoint.
In the shared memory architecture, a central switch device implements a relatively large pool of memory (also referred to as a buffer or a memory buffer). Any incoming variable size packet or cell is stored in this buffer until the variable size packet or cell is read from the buffer and forwarded out of the switch device (towards its destination). The buffer is shared by all input and output ports of the central switch device, and the buffer allows for simultaneous reads and writes from all ports. In a scenario where all input ports have simultaneously arriving variable size packets and the memory buffer already holds variable size packets for all output ports, the memory buffer must provide a bandwidth equivalent to the bandwidth of 2N system ports to support full traffic throughout on all ports, where N is the number of ports (N input ports and N output ports equals 2N total input and output ports) of the switch device. This means that memory bandwidth limits the switch capacity per switch device for the shared memory architecture.
The shared memory switch element is typically combined with additional buffering, or queuing, on both ingress and egress line cards, since the amount of buffering it is possible to implement on the central switch chip cannot meet the overall buffer requirements of the system. The queues located on the ingress line cards are typically called virtual output queues (VoQs), which eliminate head-of-line blocking effects, and the amount of buffering required on the egress line cards is influenced by the amount of speedup capacity available in the central switch devices provided, e.g., by redundant switch cards. The memory limitations of the central switch device typically introduce some extra internal control overhead to the overall system.
The shared memory architecture has several advantages including that it can switch both variable size packets and cells in their native format, thereby avoiding having to segment and reassemble a variable size packet into a cell format. It is also easy to ensure that no bandwidth is wasted regardless of the packet or cell size, which minimizes backplane over-speed requirements. Other advantages include that Quality of Service (QoS) capabilities and low latency cut-through switching can relatively easily be provided.
The second type of conventional modular backplane interconnect architecture, the crosspoint buffer architecture, is very similar to the shared memory architecture. However, instead of a single shared memory, the switch device of the crosspoint buffer architecture implements a matrix of buffers with one buffer per input/output combination. This reduces the bandwidth requirements per individual buffer to the equivalent of only two (2) system ports (one input port and one output port), as compared to 2N system ports for the shared memory architecture, which means that memory bandwidth for the crosspoint buffer architecture is less of a constraint as compared to the shared memory architecture.
However, a drawback of the crosspoint buffer architecture is that the number of individual crosspoint buffers is proportional to N2, where N is the number of ports (e.g., N input ports and N output ports) in the switch. Since it is difficult to statistically share memory between the individual crosspoint buffers, the total memory requirements of the crosspoint buffer architecture exceeds that of the shared memory architecture, and the amount of memory and number of memory building blocks per switch device therefore limits the switch capacity per switch device.
Practical implementations also include hybrid shared and crosspoint buffered architectures. In these hybrid architectures, a buffer is shared among a number of crosspoint nodes to achieve the optimal tradeoff between memory bandwidth and memory size requirements from a die size and performance perspective.
For the shared and crosspoint buffered architectures, capacity is scaled using multiple switch devices in parallel. This can be done using byte slicing or by using a flow control scheme between the switch devices to avoid cell or packet re-ordering problems at the output ports. In general, both of these schemes are difficult to scale due to timing constraints and due to the complexity of flow control algorithms.
The crosspoint buffer architecture has similar advantages to those discussed above with respect to the shared memory architecture.
The third type of conventional modular backplane architecture, the arbitrated crosspoint architecture, is based on a crosspoint switch device that provides connectivity between any input to any output, but that does not provide any buffering for traffic as it passes through the crosspoint switch device. The buffering is located on the ingress and egress line cards. The queues located on the ingress line cards are virtual output queues (VoQs), which eliminate head-of-line blocking effects, and the amount of buffering required on the egress line cards is influenced by the amount of speedup capacity available to the central switch device provided, e.g., by redundant switch cards.
The crosspoint portion of the switch device is managed by a scheduler function, which differentiates arbitrated crosspoint architectures from other techniques. The scheduler can either be implemented in a stand-alone device, or integrated into the crosspoint device itself. The latter approach improves the redundancy features and eliminates the need for dedicated communication links between the scheduler and crosspoint switch devices. The scheduler function performs the arbitration process of assigning an input-to-output pair in the crosspoint device. The arbitration decision is updated on a regular time slotted basis, which corresponds to the transmission period of a single fixed sized cell unit of data. The performance, capacity, and QoS support of the arbitrated crosspoint architecture is dependent on the type of scheduling algorithm employed.
The scheduler function can either be performed by a single scheduler unit (centralized) or by multiple schedulers operating in parallel (distributed). The centralized scheduler configures all crosspoints in the system with the result of its crosspoint arbitration decision. A distributed scheduler typically only configures a single crosspoint device with the result of its crosspoint arbitration decision. A centralized scheduler approach typically offers better control of basic QoS features but can be difficult to scale in capacity and number of ports. A distributed scheduler approach can offer improved scalability since the scheduling decisions have reduced timing constraints relative to the centralized scheduler approach, but it introduces synchronization issues among the parallel schedulers.
Compared with the shared memory architecture and the crosspoint buffered architecture, the arbitrated crosspoint architecture scales better since the crosspoint device does not use an integrated memory, and can relatively easily be scaled in capacity by adding crosspoints in parallel.
The configuration of the crosspoint in the arbitrated crosspoint architecture is locked to the arbitration process, such that the arbitration process and the crosspoint configuration is performed on a per timeslot basis. This introduces a cell-tax when the switched data units do not occupy a full timeslot switching bandwidth across a given crosspoint. The majority of scheduler algorithms are a replication or a derivation of the widely adopted iSLIP algorithm developed by Nick McKeown of Stanford University, whereby this algorithm uses an iterative approach to find a good match between inputs and outputs in a timeslot-based arbitration process. See, for example, U.S. Pat. No. 6,515,991, of which Nick McKeown is the listed inventor.
As described above, each of the conventional backplane interconnect architectures has disadvantages. Accordingly, it is desirable to improve the backplane interconnect architecture and to reduce or eliminate the disadvantages of conventional backplane interconnect architectures.