One of the most important considerations for handling network traffic is packet throughput (i.e., bandwidth). Network processors and the like are designed to efficiently process very large numbers of packets per second. In order to process a packet, the network processor (and/or switch equipment employing the network processor) needs to extract data from the packet header indicating the destination of the packet, class of service, etc., store the payload data in memory, perform various overhead functions, etc.
In general, the foregoing packet processing operations require multiple memory accesses. As a result, packet throughput is inherently related to memory (access) latencies. Ideally, all memory accesses would be via the fastest scheme possible. For example, modern on-chip (i.e., on the processor die) static random access memory (SRAM) provides access speeds of 10 nanoseconds or less. However, this type of memory is very expensive (in terms of chip real estate and chip yield), so the amount of on-chip SRAM memory is typically very small.
The next fastest type of memory is off-chip SRAM. Since this memory is off-chip, it is slower to access. Thus, a special memory bus is required for fast access. In some designs, a dedicated back-side bus (BSB) is employed for this purpose.
Typically, off-chip dynamic RAM (DRAM) is employed for most memory work. Dynamic RAM is slower than static RAM (due to physical differences in the design and operation of DRAM and SRAM cells), and must be refreshed every few clock cycles, taking up large amounts of overhead. As before, since it is off-chip, it also requires a special bus to access it. In most of today's designs, a bus such as a front-side bus (FSB) is used to enable data transfers between banks of DRAM and a processor. Under a typical design, the FSB connects the processor to a memory control unit in a platform chipset (e.g., memory controller hub (MCH)), while the chipset is connected to the memory store, such as DRAM, RDRAM or DDR DRAM (double data rate), etc. via dedicated signals.
In general, DRAM memory accesses produce significant processing latencies relative to other processing activities. In order to address this problem, various memory-caching schemes are employed. The basic concept of the caching scheme is to cache recent memory accesses (or other data based on a pre-defined caching policy) in a smaller memory device that has faster access than larger memory device in which data is usually stored (temporal locality). Also to fetch more data than needed, data that is physically close to a needed line since that data is often needed (spatial locality).
For example, under a typical scheme, on-chip SRAM is used as a first-level cache (commonly referred to as primary or “L1” cache). This memory has an extremely low latency. Off-chip SRAM is also used for a second-level cache (commonly referred to as secondary or “L2” cache. In many designs, a processor package includes both a processor die with built-in L1 cache and a separate L2 cache (contained on a separate die).
The foregoing cache schemes are common to general-purpose processors, such as those found in a personal computer or the like. In contrast, most network processors are connected directly to SRAM and DRAM, without any cache components in-between. Some modern network processor designs include both dedicated processors for packet processing and one or more general-purpose processors. However, it is impractical to provide caches such as L1- and L2-type caches on network processors without significant impact to die constraints.