There are numerous conventional data structures utilized in maintaining and searching data for varies applications. For example, a binary search tree (BST), which is also referred to as an ordered or sorted binary tree, is a node-based binary tree data structure. In a binary search tree, the left subtree of a node contains only nodes with values (keys) less than the node's value (key); the right subtree of a node contains only nodes with values greater than the node's value; and both the left and right subtrees are also binary search trees. Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their values (keys) rather than any part of their associated records. Another example is a red-black tree, which is a type of self-balancing binary search tree, used to implement associative arrays. The red-black tree is also referred to as symmetry binary B-tree. It can perform search, insert, and delete in O(log n) time, where n is total number of elements in the tree. In other words, a red-black tree is a binary search tree that inserts and deletes in such a way that the tree is always reasonably balanced. Yet another example is an AVL tree, which is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most by one. Lookup, insertion, and deletion of a node of the AVL tree take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations. For an AVL tree, the balance factor of a node is the height of its left subtree minus the height of its right subtree (or vice versa) and a node with balance factor 1, 0, or −1 is considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree.
One of the drawbacks of the conventional data structures is that they employ an automatic tree balancing algorithm that when one branch of a node in the tree has more than one level than the other branch of the node. This can cause many unnecessary rebalancing operations even though the data structure meets certain performance requirements prior to the rebalancing operation. Such unnecessary rebalancing operations can have adverse impact to the performance of the system. Another drawback of the conventional data structures is that during the rebalancing operations, the data structures are unable to service data access requests until the arrangement of nodes in the search trees have been completed. This temporary pause during the rebalancing operations can also adversely affect the performance of the system.
With the above conventional data structures, parts of the data may be stored in an on-chip cache memory and other parts of the data may be stored in an external memory. Typically, the time required to access data from external memory is significantly longer than the time required to access data from an on-chip cache memory. As data being added or removed, their corresponding nodes are added and removed from the data structure. As a node is removed from a data structure, its sub-branches may also be removed from the data structure. A Bloom Filter is a common technique used to determine certain aspect of whether a data is still in the data structure or not.
In general, a Bloom Filter is a space-efficient probabilistic data structure that is used to test whether a data (an element) is a member of the data structure (a set). The Bloom Filter algorithm allows for false positives, but does not allow for false negatives. In other words, a query of the Bloom Filter can return that a data is “inside the data structure (set), which may be wrong because of possibility of false positive” or “definitely not in the data structure (set)”. In the case of a false positive, it is considered that the data is either in the on-chip cache or in the external memory. But after the search, it would be determined that the data does not exist in the data structure, and time and computing resources can be lost in the search. With the Bloom Filter, data can be added to the data structure, but not removed from the data structure. One of the drawbacks of the Bloom Filter is that as more and more data are added to the data structure, the probability of false positive increases, which can adversely affect the performance of the system.
Therefore, there is a need for system and method that address the drawbacks of the conventional data structures and approaches.