State-of-the-art packet processors perform many operations on incoming packets at line rates (e.g., 10 Gb/s or higher). These operations include but are not limited to address lookup, packet classification, buffering, statistics maintenance, quality-of-service (QoS) scheduling and header editing. One category of packet processors includes packet switches, such as IP routers, ATM switches and Ethernet switches. Packet switches perform statistics maintenance for many reasons, including billing, firewalling, intrusion detection, network tracing and management, load balancing and traffic engineering.
In a typical scenario, a packet switch may receive and classify an incoming packet to assess what operations are to be performed on the packet. Such operations may include determining whether the packet should be accepted or dropped, or whether it should receive expedited processing. These determinations frequently result in updates to a large number of statistics, which may be represented as “counters”. In particular, it is conceivable that ten thousand or more statistics may need to be maintained in switching applications using a routing table to maintain prefix use counts or in a router that counts packets belonging to each of a plurality of connections. Moreover, even when a relatively few number of statistics are updated for each received packet, the update rates can still be extremely high. For example, updating four statistics per packet received at a 10 Gb/s rate can correspond to an update rate of 100 M updates/s.
Unfortunately, because of layout area constraints, the updating of such a large number of statistics at line rates typically precludes the use of fast on-chip memory to maintain all statistics within a packet switch. To address this limitation, proposals have been made to include high capacity off-chip memory (e.g., DRAM) to maintain statistics, which are updated periodically using read-modify-write (RMW) operations that act under control of a counter management algorithm (CMA). One such proposal is disclosed in an article by D. Shah, S. Iyer, B. Prabhakar and N. McKeown entitled “Analysis of a Statistics Counter Architecture,” which can be found on the internet at: http://tiny-tera.stanford.edu/˜nickm/papers/hoti2001.pdf. Other proposals for performing statistics updates in packet switches are disclosed in U.S. Pat. No. 6,460,010 entitled “Method and Apparatus for Statistical Compilation,” and in an article by S. Ramabhadran and G. Varghese entitled “Efficient Implementation of a Statistics Counter Architecture,” which can be found on the internet at: http://www.cse-ucsd.edu/˜varghese/papers/srirampaper.pdf.
FIG. 1 illustrates a timeline for a conventional read-modify-write (RMW) update cycle within a DDR2 SDRAM operating at 200 MHz, which may be coupled to a conventional packet switch. As demonstrated by this timeline, the RMW update cycle requires seventeen (17) clock cycles and twelve of these seventeen clock cycles are governed by internal DRAM characteristics, which are outside user control. These cycles include three clock cycles for bank access and transfer of read data from an addressed row to sense amplifiers (tRCD) and three clock cycles for accessing an addressed segment of columns within the row (tCL). The read data can be transferred over an interface bus, then modified (e.g., by adding the read data to a statistics update provided from on-chip memory within a packet switch) and then transferred back over the interface bus in five clock cycles, which are the only cycles that are a function of bus interface speed. Thereafter, three clock cycles may be required to write the modified data (e.g., updated statistic) into the sense amplifiers (tWR) and three clock cycles may be required to transfer the write data from the sense amplifiers back to the addressed row within the SDRAM bank (tRP).
However, this timing allows only one cycle for data modification, which is barely sufficient for data re-timing at the interface much less performing additional updating operations that may be necessary for particular applications. The performance of additional updating operations will typically require additional clock cycles and thereby reduce the maximum rate at which a memory bank can be updated under control of a packet switch. Finally, even if the bus interface speed characteristics are improved, the timing associated with the bus interface may have relatively little impact on the overall timing of an update cycle. This is because the update rates may still be limited by the timing associated with internal SDRAM characteristics that have generally remained constant with each new generation of device.
Additional networks, such as 10 Gb/s networks, may require state information to be updated at packet rates approaching 15 Mpps. Such state information may include connection state, metering, statistics for billing, performance monitoring and traffic engineering, scheduling and congestion management for traffic shaping and congestion control and aging for dynamic entry learning applications. Updating state information frequently implies that an old state is read from memory, an operation is performed on the old state read from memory and the updated state is returned to memory. Unfortunately, to support 10 Gb/s data rates, the options typically available to a designer include on-chip SRAM, which is relatively expensive and typically provides only limited capacity, on-chip DRAM, which is typically supported by only a few ASIC vendors and may be limited in size, and RLDRAM/NetDRAM, which is relatively expensive and not widely sourced. Moreover, conventional batch type updating operations may not be useful for stateful updating because a requested state value stored in memory must be retrieved for processing each current packet.