A computer network is a geographically distributed collection of interconnected subnetworks for transporting data between nodes, such as computers. A local area network (LAN) is an example of such a subnetwork; a plurality of LANs may be further interconnected by an intermediate network node, such as a router or switch, to extend the effective “size” of the computer network and increase the number of communicating nodes. The nodes typically communicate by exchanging discrete packets of data according to predefined protocols. The data packets transferred among the nodes may include fixed sized data cells and/or variable sized data frames. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Broadly stated, an intermediate network node is configured to exchange data packets between nodes connected to a wide range of communication links and subnetworks. To that end, the intermediate node implements a set of network services for the communicating nodes. The set of services may include route processing, path determination and path switching functions. The route processing function determines the type needed for a received packet, whereas the path switching function allows the intermediate node to accept a packet on a first interface and forward it on a second interface. The path determination, or forwarding decision, function selects the most appropriate interface for forwarding a packet.
To perform a set of network services, an intermediate network node includes a processing engine. The processing engine may be a single processor programmed, in hardware and/or software, to implement route processing, path determination and path switching functions for packets received by the intermediate node. However, depending on the complexity of the network services provided, the processing engine may be implemented in a number of different architectures, including, but not limited to, field-programmable gate arrays (FPGA), application-specific integrated circuits (ASIC) and multiprocessor configurations.
In some multiprocessor implementations, the functions of the processing engine are distributed among a plurality of processors and coprocessors. As used herein, a coprocessor is a special-purpose processing unit that assists other processing units, such as general-purpose processors, in performing certain types of operations. For example, general-purpose processors in an intermediate node may be configured to perform route processing and path switching functions, whereas an associated coprocessor is configured to perform path determinations. In this case, the general-purpose processors “off-load” path determination functions to the coprocessor, which may be optimized to handle such operations in a fast and efficient manner.
Thus, in a multiprocessor architecture for a processing engine, a plurality of general-purpose processors may rely on a single coprocessor that is optimized to perform a subset of network services, such as path determinations. However, a problem arises when the multiple processors simultaneously request the services of the coprocessor. This problem is exacerbated if the coprocessor is configured to operate on requests serially, e.g. one at a time. In this case, the processors may transfer requests to the coprocessor faster than the coprocessor can process them, resulting in undesirable and unexpected latencies.
For example, assume multiple processors send requests to a coprocessor to perform a subset of network services. Each processor assembles a request in its local memory and sends the request as a sequence of packets to the coprocessor. A processor “assembles” a request by transferring individual portions of the request to its local memory. System limitations, such as bus bandwidth, software protocols, memory latencies, etc., may prevent the processor from transferring the assembled request to the coprocessor as a single transmission. Therefore, each processor in turn typically segments a request and sends it as a series of individual packets. In response, the coprocessor (i) receives the packets of each request, (ii) reassembles those packets in the request and (iii) enqueues the request in a buffer, such as a first-in, first out (FIFO) queue. Thereafter, the coprocessor processes the requests one at a time. Clearly, there is latency associated with such serial processing of requests, despite the optimized configuration of the coprocessor to efficiently handle certain operations. The present invention is directed to reducing this latency and allows multiple processors to assemble requests simultaneously in a random order.