Broadband access technologies, such as cable modems and digital subscriber lines (DSLs), enable service providers to distribute multimedia content over data networks. Some of the applications offered by service providers, such as broadband video and interactive gaming, require multicast distribution of content from a source (the service provider, generally connected to a backbone network) to multiple destinations (end users, generally connected through an access network). For these applications, several hundreds of end users may be served, and so the content must be delivered at low cost to the network. Cost may be measured in a number of ways, such as node delay/congestion, processing/memory requirements, or complexity.
Because access packet networks support many broadband technologies, the multicast distribution of content requires efficient support of multicast connections in switches and routers that receive the content and transfer the content to the access packet network. For example, a router in a network may receive broadcast video from a satellite, and then multicast this content to a number of users connected to the network. These switches and routers are sometimes referred to as edge switches and edge routers. For the following description, the term “router” is used, but the description applies equally to switches.
In a computer network, data is transmitted between users as formatted blocks of information called “packets.” For a multicast connection to two or more end users, a multicast session comprises a stream of packets (or “packet flow”). The packet flow from the content provider is received at an ingress port of the edge router. To generate a packet flow from the edge router to each end user that is to receive the multicast content, the edge router duplicates each packet of the packet flow. Each multicast packet is separately addressed for a corresponding egress port. A packet processor replicates the packet into multicast packets, queues each multicast packet, and delivers each multicast packet to a corresponding egress port.
Replicating packets, either through pointer manipulation or actual copying, is an expensive process in terms of processing cycles and/or memory bandwidth used. Typically, this process was performed in network processors by a single thread in a packet processing engine (PPE). Packet processing engines analyze packet headers to determine the next hop for the packet, and configure the packet headers to send them there. Multiple threads in a PPE were not historically used for processing a single packet because a) the threads operate concurrently and there may not be an efficient means to do packet communication between the threads; and b) even when there is a packet communication path, packet replication has specific ordering requirements which forces multiple threads to do the multiplication in a serial manner (which defeats the purpose of having multiple threads do the processing). Because there is an application requirement that packets within the same flow should exit a device in the same order, there needs to be a scheme to maintain packet order while achieving true parallel processing of the packets.