Databases are ubiquitous in packet forwarding devices (e.g., routers). Databases are typically implemented in the form of well-balanced trees such as WAVL trees, for example. These well-balanced trees provide acceptable performance. For example, the lookup, insertion and deletion times of a WAVL tree can be defined as O(log(n)), where n is the number of data nodes in the WAVL tree. However, packet forwarding devices tend to handle increasing amounts of traffic. As databases in packet forwarding devices increase in size, the number of data nodes make any existing scheme suffer and eventually affect the performance of the packet forwarding device.
As an example, when a First Sign of Life (FSOL) is detected for a new session on a broadband network gateway (BNG) router, the new session is searched on a Session Attribute Database (SADB), for example, to avoid creating duplicates or to manage dual stack (IPv4/IPv6) sessions. After performing the search on the SADB, other activities required to properly start up the new session are executed. FIG. 1A is a graph illustrating measured session start time cost on an example BNG router. In FIG. 1A, the measured start time 102 and the average start time 104 are shown. The graph illustrates the session start execution time (in nanoseconds) versus the number of started sessions (0<n<16 k) on the BNG router. The average start execution time includes two components: (1) a fixed cost 101 related to standard start operations and (2) a variable cost (i.e., c(n)) 103 related to the number of started sessions. The variable cost 103 is related to searching for the new session on the SADB (e.g., a WAVL tree, for example) to retrieve session data if a session is already started. FIG. 1B is a graph illustrating a linear interpolation 106 and a logarithmic interpolation 108 of the measured session start cost of FIG. 1A. FIG. 1C is a graph illustrating a linear interpolation 106 and a logarithmic interpolation 108 of the measured session start cost extended to a BNG router with 0<n<256 k started sessions. As shown in FIG. 1C, it is estimated that 46% of the session start execution time is related to the variable cost 103 according to the logarithmic interpolation 108 (342% for the linear interpolation 106) for 256 k sessions. Because the variable cost 103 is related to access/search operations on the WAVL tree, the logarithmic interpolation 108 is likely the better estimate.
Bloom Filters (BFs) have been used in database applications in related art. The structure of a BF is basically that of an array of m bits, each of which is initially set to zero. An element x of the set is represented in the BF by applying K distinct hash functions h1( ) . . . hk( ) to x and setting to 1 the bits at positions h1(x) . . . hk(x) in the array. Because of its randomized and hashing-based mechanism, the price to pay for BF's small memory footprint is the presence of a certain (and known) amount of false positives. For example, when responding to membership queries on a BF (e.g., Is element x in set S?), it is possible to receive a false positive even if the BF returns a positive response. However, it is not possible to receive a false negative. In other words, if a membership query on a BF returns a negative response, the queried element does not belong to the set represented by the BF. In most applications, the advantages of using a BF outweigh the risks of receiving false positives. Additionally, false positives are controllable and occur at a known rate. Specifically, when
      m    =                  n        ×        K                    ln        ⁡                  (          2          )                      ,when where m is the number of bits in the BF, n is the number of elements in the set and K is the number of has functions, the probability of receiving a false positive is defined as f=2−K.
Referring now to FIGS. 2A-2B, processes of constructing a BF and performing a lookup using the BF are shown. As shown in FIG. 2A, the process of constructing a BF representing the set of elements {x, y} is shown. Initially, the BF is empty and all of the bits are set to 0. Then, the hash functions (e.g., h1( ), h2( ), h3( )) are computed on element x, and the bits pointed by the arrows are set to 1 based on the result of the hash functions. The hash functions (e.g., h1( ), h2( ), h3( )) are then computed on element y, and the bits pointed by the arrows are set to 1 based on the results of the hash functions. The resulting BF representing the set of elements {x, y} is shown.
In FIG. 2B, the process of performing a lookup using the BF is shown. For example, when searching a BF representing the set of elements {x,y} for element x, the hash functions are computed on element x. A determination is made that all bits pointed by the results of the hash functions (bits 0, 3 and 6, for example) are set to 1. This indicates that element x is a member of the set of elements {x,y}. On the other hand, when searching a BF representing the set of elements {x,y} for element t, the hash functions are computed on element t. The hash functions return bits 1, 4 and 5, for example. Because bits 4 and 5 are set to 0, t is not a member of the set of elements {x,y}. Additionally, when searching a BF representing the set of elements {x,y} for element z, the hash functions return bits 0, 3 and 7, for example. Because bins 0, 3 and 7 are set to 1 but z is not a member of the set of elements {x,y}, a false positive is returned.
BFs, however, fail to support deletion of items from a data set. Thus, Counting Bloom Filters (CBFs) have been developed to provide a way to implement a delete operation on a BF without recreating the BF. In a CBF, the array positions (or bins, buckets, slots, etc.) are extended from being a single bit to an n-bit counter. In fact, regular BFs can be considered as CBFs with a bucket size of one bit. The insert operation is extended to increment the value of the buckets, and the lookup operation checks that each of the required buckets is non-zero. The delete operation includes decrementing the value of each of the respective buckets. The size of counters is typically set to 3 or 4 bits. Hence, CBFs typically use 3 to 4 times more space than regular BFs.
CBFs have been used within a database system called a Shared Fast Hash Table (SFHT), which is discussed in detail in Song et al., Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing, SIGCOMM '05, Aug. 21-26 (2005). In a SFHT, the CBF is extended in order to also maintain a pointer to a list of elements contained in each CBF bin. Referring now to FIG. 3, an example Shared Fast Hash Table representing elements {x, y, z, w} is shown. The SFHT is a CBF 300 having an array of bins 301 and corresponding pointers 303. The pointers 303 are capable of pointing to a list of elements contained in each of the bins 301, and each of the bins 301 is a counter. In FIG. 3, element x 302, element y 304, element z 306 and element w 308 are inserted into the CBF as discussed above (e.g., compute the hash functions for the element, increment the counter, etc.). Additionally, a pointer 303 is maintained to a list of elements contained by each of the bins 301. As shown in FIG. 3, the elements are not completely shared because element w 308 is duplicated. When a search is executed, the hash functions are computed on the membership query (and assuming all of the bins are non-zero), the search is performed on the list associated with the bin having the lowest counter value.
However, as the number of hash functions and the number of elements in the set of elements increase, the number of collisions increases, which causes performance of a search using the CBF to degrade with respect to a search of the standard database tree (i.e., a WAVL tree).