1. Field of the Invention
The present invention relates to routing connections in a telecommunications network, and, more particularly, to scheduling and replication of packets for multicast connections.
2. Description of the Related Art
Broadband access technologies, such as cable modem, passive optical network (PON), and DSL, enable service providers to distribute multimedia content over data networks. Some of the applications offered by service providers, such as broadband video and interactive gaming, require multicast distribution of content from a source (the service provider, generally connected to a backbone network) to multiple destinations (end users, generally connected through an access network). For these applications, several hundreds of end users may be served, and so the content must be delivered at low cost to the network. Cost may be measured in a number of ways, such as node delay/congestion, processing/memory requirements, or complexity.
Because access packet networks support many broadband technologies, the multicast distribution of content requires efficient support of multicast connections in switches and routers that receive the content and transfer the content to the access packet network. For example, a router in a network may receive broadcast video from a satellite, and then multicast this content to a number of users connected to the network. These switches and routers are sometimes referred to as edge switches and edge routers. For the following description, the term “switch” is used, but the description applies equally to routers.
In general, an edge switch includes a set of line cards that are interconnected through a switching fabric. Line cards both receive data and transmit data, and support the line format, or the physical and lower layers, of the transmission medium (e.g., OC-1 or OC-3 optical links transferring data in accordance with SONET or ATM standards). Some line cards may be connected to other packet networks, satellite links, or similar types of communications networks that provide the content, and other line cards may be connected to the access network to provide connectivity to end users through point-to-point connections. If content is received at the line card, this interface is referred to as an ingress port of the edge switch. If content is provided from the line card, this interface is referred to as an egress port of the edge switch.
For a multicast connection to two or more end users, a multicast session comprises a stream of packets (or “packet flow”). The packet flow from the content provider is received at an ingress port of the edge switch. To generate a packet flow from the edge switch to each end user that is to receive the multicast content, the edge switch duplicates each packet of the packet flow. Each multicast packet is separately addressed for a corresponding egress port. The egress logic, termed a traffic manager, replicates the packet into multicast packets, queues each multicast packet, and delivers each multicast packet to a corresponding egress port. The number of egress ports that packets of a multicast session are replicated for is referred to as the fan-out of the session.
Replicating packets, either through pointer manipulation or actual copying, is an expensive process in terms of processing cycles and/or memory bandwidth used. The process of replicating packets must eventually copy each packet to a potentially large number of queues, which process is usually completed at “wire-speed” (i.e., with a continuous stream of packets at the smallest size, the multicasting process should be completed without delay). Satisfying the wire-speed requirement translates to a corresponding speed-up requirement from the memory that is on the order of the number of egress ports.
To satisfy the wire-speed requirement, the traffic manager should have enough memory bandwidth available to write and read all packets to and from the memory. In addition, the traffic manager should be able to efficiently process the data structures that are employed to maintain the queues. Since each egress port may support different levels of quality-of-service (QoS) provisioning, and since each switch typically includes a large number of egress ports, the total number of queues that must be maintained by the data structure and that must be supported by the traffic manager is very large. Consequently, queues are implemented as virtual queues using linked-lists, and, if variable-sized packets are stored in terms of linked-lists of constant size buffers, the queues may be implemented using linked-lists of buffers. Maintaining linked-lists also places a high demand on the memory bandwidth. Finally, to satisfy the wire-speed requirement, the traffic manager should be able to process arbitrarily long streams of minimum-sized packets.
Any single multicast-session packet might be added into several queues and/or read several times from the memory during the process of replicating the packet. Although the actual packet might be stored only once in the memory, several data structures may be updated. Several methods are employed in the prior art to implement the replication process for multicasting: maximum speed-up, staging buffers, and dedicated multicast queues.
The maximum speed-up method is commonly known as the replication-at-receiving (RAR) method. To replicate a packet in the RAR method, a packet is stored in the memory only once, and a descriptor of the packet is added to all the per-interface queues that the packet must be transmitted to. A per-interface queue is a separate queue associated with a particular egress port. The RAR method minimizes memory bandwidth for storing packets, but increases the control memory bandwidth in order to add a descriptor of each incoming packet to all outgoing queues. When fan-out is large, the RAR method becomes impractical. A given implementation of the RAR method has a memory speed-up that is determined by a worst-case processing scenario. Consequently, most of the available memory bandwidth remains unused most of the time. If less than the total worst-case memory speed-up is used, certain traffic patterns may occur that result in packet dropping when back-to-back packets must be transmitted to a large number of egress ports at a rate greater than the available speed-up. Packet dropping occurs since the second packet arrives before completing insertion of the first packet into all the egress port queues.
When staging buffers are employed, multicast packets are separated from unicast traffic and placed in a separate queue, or staging buffer, on arrival. The traffic manager replicates packets to per-interface queues using some level of speed-up, but not the full memory speed-up dictated by the worst-case scenario. While this provides some protection from packet dropping, especially for the back-to-back packet-arrival case, other traffic patterns may cause unstable operation if the rate of multicast processing is less than (the arrival rate of multicast packets) times (the fan-out). Accurate design of the staging buffers might not be possible unless traffic pattern characteristics are well known (e.g., unless burst lengths of connections applied to the switch are known a priori).
An alternative approach to the RAR method and staging buffers employs dedicated multicast queues and is known as the replication-at-send (RAS) method. Multicast packets are queued independently in one or more dedicated multicast queues, and packets destined for different egress ports might be buffered together in the same queue. Each multicast queue is visited periodically, the first packet of the multicast queue is read, and the packet replicated across all egress ports on the fly. Consequently, multicast packets are not queued on a per-interface queue basis.
For the RAS method, each egress port typically consumes (transmits) packets at a rate that is much less than the rate at which the traffic manager processes packets, leading to head-of-line blocking. Head-of-line blocking may occur, for example, when a burst of packets for a first multicast session to one group of egress ports arrives before a burst of packets of a second multicast session to a different group of egress ports. If the egress ports corresponding to the first multicast session are congested, it may take several service/processing periods before the packets of the first multicast session are replicated and sent, even though the egress ports for the second multicast session may be fairly uncongested. In addition, since the traffic manager processes the multicast queues at a rate faster than the egress ports consume packets, the traffic manager may pick the packets of the first session in sequence and then wait to copy these packets to the already-congested egress ports. Thus, the packets of the second multicast session might be delayed or even dropped/blocked (when, e.g., a delay threshold is exceeded) while the traffic manager processes the burst of packets for the first multicast session.
As would be apparent to one skilled in the art, the methods of the prior art provide varying levels of system performance, but do not provide deterministic operation (e.g., bounded operation in terms of latency or stability). In addition, systems designed according to the methods of the prior art may depend upon various assumptions as to the characteristics of arrival traffic, which may lead to poor system performance when the assumptions are in error or if traffic-pattern characteristics change with time. The methods of the prior art also might not adequately provide for intelligent buffer management, or allow for QoS guarantees.