Network routers generally include a number of ports or interfaces on which packets are received and transmitted. Handling and directing packets from and to the various ports may be a computational intensive task, depending on the protocols and features enabled on the router. The central processing unit (CPU) in a router must often handle a large work load under heavy traffic loadings.
Instead of just increasing the power of a single CPU, it is possible to have multiple CPUs configured as Symmetric Multiprocessors (SMPs) which work together. SMP systems include a number of CPU's, all of which have access to a shared memory. Generally each CPU in such systems has its own cache (called an L1 cache). Data in the shared memory can be accessed by all the processors; however, data in the L1 cache of a particular processor can only be accessed by that particular processor. The data in the cache must be kept coherent or consistent with respect to shared memory under control of external logic present in the processor complex. These coherency operations are usually expensive (in terms of CPU processing cycles) and should be minimized where possible.
One advantage of SMP systems is that they may execute multiple threads in parallel. A thread (sometimes called an execution context or a lightweight process) is an execution unit of code that implements a flow of control within a programming application. In an SMP system each thread runs independently from the others and multiple threads can be executing at the same time. A scheduler assigns threads to the different processors based on considerations such as CPU availability and the thread's run status. Generally the object is to assign threads to the processors in such a way that all of the processors are kept equally busy or load balanced.
A typical network router has multiple ports on which packets are received and transmitted and a thread could involve the processing steps needed to transfer a packet between a particular set of ports. A thread designed to transfer packets between a particular set of ports must include steps that retrieve information from memory. For example the thread may have to retrieve information about a set ports from memory in order to perform a packet transfer between the particular set of ports.
The processor to which a thread is assigned would generally store data retrieved from memory in its L1 cache. If the scheduler assigned the task of switching the next packet traveling between this same pair of ports to a different processor, the data accumulated in the first processor's cache could not be accessed by the second processor and the data would again have to be again retrieved from the shared memory. A processor that does not have the needed information in its cache could not perform a transfer as quickly as could a processor which has the relevant information in its cache.