As is well known, data networks can enable the flow of data packets between network source(s) and destination(s). Such applications can include packet forwarding applications, such as the generation of a “next hop” address as a data packet propagates through a network. Further, to provide additional services, increase performance, and manage growth, it can be desirable to acquire data regarding network use. As but one example, it can be desirable to measure network flow data.
Network flow measurement typically includes aggregating particular data values related to each network flow. This collected data can then be used for various applications, including but not limited to, billing, traffic management, and other types of network analysis techniques. A number of network flow measurement techniques are known. Three particular naming conventions are “NetFlow” provided by Cisco Systems, of San Jose, Calif., “J-Flow” provided by Juniper Networks, of Sunnyvale, Calif., and “sFlow” provided by Foundry Networks, Inc., of San Jose, Calif.
To better understand features and advantages of the disclosed embodiments, examples of conventional network data aggregation will now be described. FIG. 23 shows an example of a table containing data for network flows, as well as associated data for such flows. FIG. 23 is a diagram depicting a “Primary Table” for storing flow information, and represents one very particular approach in which five fields are utilized to define a network flow: a Source IP Address, a Destination IP Address, a Protocol Type, a Source Port, and a Destination Port. In such an arrangement, these fields can be collectively considered a “flow descriptor”. As is well understood, such fields can exist in a network packet header transmitted according to a given protocol (e.g., TCP/IP). The associated data of FIG. 23 shows two fields: Number of Packets and Bytes.
Of course, FIG. 23 represents but one example of representing one particular flow type. Packets transmitted according to different protocols would have different header information. For example, NetFlow noted above utilizes seven fields to define a flow.
Referring still to FIG. 23, in a conventional data aggregation approach, an incoming packet header can be parsed for the network flow descriptor fields (in this example, five fields). If these fields of the packet are unique (i.e., not yet included as an entry in the Primary Table), a new entry is created for the flow, with appropriate Associated Data. For example, a “Number of Packets” can be set to 1, and the “Bytes” can be the number of bytes in the received packet. If the flow descriptor fields of a received packet are not unique, the data associated with the flow can be updated according to the received packet (i.e., number of packets and bytes is incremented according to the new packet).
The amount of data produced by flow identification can be very large, due to the granularity with which flows are typically defined (number of fields for flow descriptor). That is, the larger the number of fields used to define flows, the more granularity in the flow data. High granularity data can create a data explosion in network statistics that can be difficult to manage and analyze.
Granularity of accumulated flow data is typically defined by network equipment provided by a vendor. In many cases, the granularity provided by a vendor can be higher than necessary for a given application. That is, a given application may need to collect statistics on a smaller number of fields (e.g., two fields) than the granularity provided by the network equipment. In such applications, collected network flow data can be aggregated based on the smaller number of fields.
Network flow data aggregation can address the large amounts of data presented by high granularity flow data. By aggregating data based on a smaller number of fields, concise flow data can be gathered that is easier to interpret and transfer over a network. For example, Cisco NetFlow (Version 8) supports eleven aggregation schemes for addressing data explosion by making the amount of flow data more tractable.
The number of flows in a network at a given instant can be a very large number that varies as newer flows are added and older flows are deleted. Thus, entries of a Primary Table, like that of FIG. 23, can be continuously aged (deleted, expired) at an average flow rate in order to free up space for new flows. Aging of Primary Table entries can be based on various criteria. A few rules for aging according to NetFlow include (1) expiring idle flows (flows that have been idle for a specified amount of time are expired and removed from the primary table); (2) expiring long lived flows (e.g., as a default, flows in existence for 30 minutes can be expired); (3) expiring flows as the Primary Table fills up (as the Primary Table fills up, a number of heuristics can be applied to aggressively age groups of flows simultaneously); (4) expiring flows based on flag detection (flows can be expired based on predetermined flags/indications within a packet (e.g., for TCP end of byte stream flag (FIN) or reset flag (RST)).
In conventional arrangements, there can be two possible outcomes for Primary Table entry that is aged. These operations are shown in FIG. 24. FIG. 24 is a diagram that shows Primary Table 2400 of FIG. 23, which includes flow-defining entries 2402-1 to 2402-4. In FIG. 24, it is assumed that the first entry 2402-1 is aged (expired from the Primary Table).
A first outcome can occur if the network equipment does not support data aggregation. In this case, the associated data for the aged flow can be sent to a flow collector device 2404.
However, if data aggregation is supported, associated data for the expired entry can be added one or more Secondary Tables (or aggregation tables). The example of FIG. 24 shows two aggregation schemes, resulting in two Secondary Tables 2406-0 and 2406-1. In particular, Secondary Table #1 2406-0 aggregates flow data based on Source IP Address and Destination IP Address, while Secondary Table #2 2406-1 aggregates flow data based on Source IP Address and Source Port.
In such an arrangement, as a Primary Table entry is expired, the Source IP Address and Destination IP Address are checked against corresponding fields in Secondary Table #1 2406-0. If the Source IP Address/Destination IP Address combination is unique, a new entry is added to Secondary Table #1 2406-0. If the combination already exists, the associated data for the Secondary Table #1 2406-0 can be updated. In the same general fashion, a Source IP Address and Source Port Address can be checked against corresponding fields in Secondary Table #2 2406-1 and either a new entry added or the table updated.
A number of conventional approaches to aggregating data are known.
One approach for aggregating data utilizes “hashing”. One example of a hashing approach is shown in FIG. 25, and designated by the general reference character 2500. In a hashing arrangement, a Primary Table can be stored in RAM 2502. When an entry is expired, the fields of the entry corresponding to an aggregation scheme can be applied to a hash function to arrive at an address at which to store aggregation data. More particularly, network flow data can be stored in a random access memory (RAM), and hashing can be used to differentiate between primary table and aggregate table entries. In the particular example shown, a hashing function can be executed by hashing logic 2504 formed in an application specific integrated circuit (ASIC) 2506.
A drawback to conventional hashing approaches can be the amount of memory space needed to accommodate the different data tables. That is, a single entry is needed for each value of the different tables.
Another drawback to conventional hashing approaches can be “collisions”. As is well known, hashing functions can map a larger address space (i.e., that represented by all possible key combinations) into a smaller address space (i.e., that of the RAM). However, hashing functions are rarely perfect and thus give rise to collisions in which two keys hash to the same location. In such a case, the colliding key must be re-checked with each colliding entry to complete a search. Due the variations in key values, collisions can be an inevitable problem with hashing solutions, and is anticipated to become an even more critical problem as data aggregation is performed at faster line rates (bit transmission speeds).
The above collision problem also gives rise to non-deterministic search times. That is, while a non-colliding search may take one memory access, a colliding search may make multiple memory accesses. It is believed such varying search times will also be more difficult to handle as line rates increase.
Another drawback to hashing solutions can be the data dependence inherent in hashing functions. In particular, a hashing function will give a different collision profile depending upon the input data set. That is, a hashing function operating on a randomly generated data set may have a different collision profile than one operating on actual network data. Because of this, for optimal performance, a hash function polynomial is optimally selected based on a known entry format and expected distribution. However, such approaches remain imperfect.
Related to the above hashing function mapping problems is “funneling”. Funneling is a flaw in some hashing functions that can arise when input values vary by only a small amount (i.e., differ by only a few bits). In such cases, variability based on input bit values can be lost. This can lead to entries that vary by only a few bits hashing to the same value, thus giving rise to a large number of collisions.
A second conventional approach is shown in FIG. 26 and designated by the general reference character 2600. A conventional arrangement 2600 can include a CAM portion 2602 and a dynamic RAM (DRAM) portion 2604. In the conventional example of FIG. 26, key portions of both a Primary Table (containing network flow entries) and Secondary Tables (containing aggregate data entries) can be stored in one or more CAM sections 2602. Associated data for each CAM entry can be stored in DRAM portion 2604. In a search operation, a matching CAM entry will generate an index value for accessing the RAM entry containing the associated data.
A conventional approach like that of FIG. 26 can offer considerable advantages in terms of speed and use over hashing approaches. Utilization of CAMs can result in search times that are deterministic and very fast, due to the high speed at which CAM devices can compare key values.
However, a conventional approach like that of FIG. 26 is not without drawbacks. While CAMs provide advantageously fast search speeds, the cost per bit for a CAM device can be considerably higher than that of a DRAM or static RAM (SRAM). As would be understood from FIG. 26, if the number of secondary tables is high, the cost would increase correspondingly.
To better understand various features of the disclosed embodiments, a comparison with respect to the memory requirements of the conventional approaches noted above will be discussed in more detail. Due to the absence of predictability in the behavior of certain networks, and to ensure optimal performance, each secondary table would generally have to be at least equal in size to the primary table. This is because each entry of a primary table could potentially aggregate to a unique entry in a secondary table. Accordingly, if a networking device has a primary table of size “N” entries and has “m” aggregation schemes, the memory space needed would be N*(m+1). It is noted that in a hashing arrangement like that of FIG. 25, while a RAM could include N*(m+1) entries for storing data table values, in order to minimize collisions, a memory space allocated for the entries is typically at least 2× or 4× the number of anticipated data values. Thus, an actual implementation could require an overall storage space size of 4*N*(m+1), with “m+1” being representing the primary table and “m” aggregation schemes.
In the case of a CAM/RAM approach like that of FIG. 26, a system would need both N*(m+1) CAM entries, as well as N*(m+1) DRAM entries.
In light of the above, it would be desirable to arrive at some way aggregating network flow data that does not require as much CAM memory as conventional approaches like those described above. However, at the same time, such a solution should not suffer from the non-deterministic behavior that can arise from approaches utilizing hashing functions.
In the case of packet forwarding, previously forwarding functions could be based on a single criterion: the destination address of a data packet. However, presently more sophisticated forwarding approaches are needed and/or anticipated. That is, it is desirable to forward data packets based on multiple criteria.
As but one very particular example, it is desirable to accommodate varying levels of service for data packets based on one or more identifying features, e.g., different “quality of service” (QOS) or “type of service” (TOS). Conventionally, approaches to such require very large lookup data bases, increasing components size and hence system cost. In particular, a conventional approach can include multiple entries (e.g. CAM storage locations) for each destination/service combination.
In light of the above, it would be desirable to arrive at some way of reducing the number of system components needed in forwarding operations based on multiple criteria.
Still further, all of the above illustrates how there is a general need to arrive at some way of reducing the storage space needed for associative memory (e.g., CAM) applications, particularly those embodying multiple data sets.