Network devices, such as switches and routers, are designed to forward network traffic, in the form of packets, at high line rates. One of the most important considerations for handling network traffic is packet throughput. To accomplish this, special-purpose processors known as network processors have been developed to efficiently process very large numbers of packets per second. In order to process a packet, the network processor (and/or network equipment employing the network processor) needs to extract data from the packet header indicating the destination of the packet, class of service, etc., store the payload data in memory, perform packet classification and queuing operations, determine the next hop for the packet, select an appropriate network port via which to forward the packet, etc. These operations are generally referred to as “packet processing” operations.
Modern network processors (also commonly referred to as network processor units (NPUs)) perform packet processing using multiple multi-threaded processing elements (e.g., processing cores) (referred to as microengines or compute engines in network processors manufactured by Intel® Corporation, Santa Clara, Calif.), wherein each thread performs a specific task or set of tasks in a pipelined architecture. During packet processing, numerous accesses are performed to move data between various shared resources coupled to and/or provided by a network processor. For example, network processors commonly store packet metadata and the like in static random access memory (SRAM) stores, while storing packets (or packet payload data) in dynamic random access memory (DRAM)-based stores. In addition, a network processor may be coupled to cryptographic processors, hash units, general-purpose processors, and expansion buses, such as the PCI (peripheral component interconnect) and PCI Express bus.
Network processors are often configured to perform processing in a collaborative manner, such as via a pipelined processing scheme. Typically, different threads perform different portions of the same task or related tasks, with the output of one thread being employed as an input to the next thread. The threads are specifically tailored for a particular task or set of tasks, such as packet forwarding, packet classification, etc. This type of scheme enables packet-processing operations to be carried out at line rates for most packets, also referred to as “fast path” operations. However, some packets present problems that require additional processing. Under one approach, packet processing for these packets is performed by using “slow path” operations performed by a general-purpose processor or the like, wherein a redirection event causes packet processing to switch from the multi-threaded processing elements to the general purpose processor. The general-purpose processor typically provides a larger instruction set than the multi-threaded processing elements, supporting execution of more flexible and complex tasks that are designed to handle such “problem” packets.
In many instances, the architectures employed by the multi-threaded processing elements and the general-purpose processor are significantly different. For instance, the microengines on many Intel® NPUs employ RISC (reduced instruction set computer) architectures, while the general-purpose processor employs a CISC (complex instruction set computer architecture). Furthermore, the operating speeds of the different types of processing elements are usually different.
Hand-offs from fast-path to slow-path processing require communication between the microengines and the general-purpose processor. Furthermore, both the microengines and the general-purpose processor need to access packet data stored in memory, which may be accessed via a memory or system bus running at yet another frequency. As such, it is necessary to have some type of clocking scheme that enables processing elements and buses running at different clock frequencies to communicate with each other. Heretofore, this has been done by having the clock frequency of one type of processing element (e.g., the micro-engines) be an integer multiple of the clock frequency of another type of processing element (e.g., the general-purpose processor) or bus.
While this supports communication between the processing elements and system resources such as memory, it limits the design flexibility of the overall NPU architecture. For example, it may be advantageous to increase the frequency of one type of processing element while leaving the frequency of another type or bus alone, or otherwise employing an architecture under which the ratios between the clock domains is a rational number rather than limited to an integer ratio.