A search method is a method that accepts an argument a and tries to find a record whose key is a. The method may return the entire record or, more commonly, it may return a pointer to that record. It is possible that the search for a particular argument in a table is unsuccessful; that is, there is no record in the table with that argument as its key. In such a case, the method may return a special "nil record" or a nil pointer. Very often, if a search is unsuccessful, it may be desirable to add a new record with an argument as its key. A method that does this is called a search and insertion method. A successful search is often called a retrieval.
Sequential Search Method
The simplest form of a search is the sequential search. This search is applicable to a table that is organized either as an array or as a linked list. A sequential search method would examine each key in turn, and upon finding one that matches the search argument, its index (which as a pointer to its record) is returned. If no match is found, 0 is returned.
Efficiency of Sequential Searching
If we assume no insertions or deletions, so that we are searching through a table of constant size n, then the number of comparisons depends on where the record in the argument key appears in the table. If the record is the first one in the table, only one comparison is performed; if the record is the last one in the table, n comparisons are necessary. If it is equally likely for an argument to appear at any given table position, a successful search will take (on the average) (n+1)/2 comparisons, and an unsuccessful search will take n comparisons. In any case, the number of comparisons is 0(n), that is, a function of n, the size of the table or list.
Searching an Ordered Table
If the table is stored in ascending or descending order of the record keys, there are several techniques that can be used to improve the efficiency of searching. This is especially true if the table is of fixed size. One advantage in searching a sorted file over searching an unsorted file is in the case where the argument key is absent from the file. In the case of an unsorted file, n comparisons are needed to detect this fact. In the case of a sorted file, assuming that the argument keys are uniformly distributed over the range of keys in the file, only n/2 comparisons (on the average) are needed. This is because we know that a given key is missing from a file which is sorted in ascending order of keys as soon as we encounter a key in the file which is greater than the argument.
The Binary Search
The most efficient method of searching a sequential list without the use of auxiliary indices or lists is a binary search. Consider an array of elements in which objects have been placed in some order. If the array contains only one element, the problem is trivial. Otherwise, compare the item being searched for with the item at the middle of the array (or list). If they are equal, the search has been completed successfully. If the middle element is greater than the item being searched for, the search process is repeated in the first half of the array (since if the item appears anywhere, it must appear in the first half); otherwise, the process is repeated in the second half. Note that each time a comparison is made, the number of elements yet to be searched is cut in half. For large arrays, this method is superior to the sequential search, in which each comparison reduces the number of elements yet to be searched by only one. Each comparison in a binary search reduces the number of possible candidates by a factor of 2. Thus the maximum number of key comparisons that will be made is approximately log(2)n.
Unfortunately, the binary search method can only be used if the list is stored as an array. This is because it makes use of the fact that the indices of array elements are consecutive integers. For this reason, in the past, the binary search has been found to be useless in situations where there are many insertions or deletions, so that an array structure is inappropriate.
Additional Methods
Additional methods for searching and managing lists of elements include 3-2 tree searching methods, B-tree of order m searching methods, Balanced Binary Tree searching methods, and various other methods that are typically found in college level data structures text books, such as A. Tenenbaum & M. Augenstein, Data Structures Using Pascal (Prentice-Hall 1981).
A 3-2 tree is one which each node has two or three sons and contains either one or two keys. If a node has two sons, it contains one key. All keys in its left subtree are less than that key and all keys in its right subtree are greater than that key. If a node has three sons, it contains two keys. All keys in its left subtree are less than its left key, which is less than all keys in its middle subtree. All keys in its middle subtree are less than its right key, which is less than all keys in its right subtree.
A B-tree of order m is a generalization of the 3-2 trees. Such a tree is defined as a general tree that satisfies the following properties:
1. Each node contains at most m-1 keys. PA1 2. Each node except for the root contains at least (m div 2)-1 keys. PA1 3. The root has at least two sons, unless it is a leaf. PA1 4. All leafs are on the same level. PA1 5. A nonleaf node with n keys has n+1 sons.
A Balanced Binary Tree is a binary tree in which the heights of the two subtrees of every node never differ by more than one. The balance of a node in a binary tree is defined as the height of its left subtree minus the height of its right subtree. Node deletion is not covered within the Balanced Binary Tree method, and must be done using other techniques. A detailed treatment of Balanced Binary Trees may be found in any of several college-level textbooks on data structures, as mentioned previously.
There is a non-deterministic nature of Balanced Binary Tree overhead that arises from two factors. First of all, the tree is actually not perfectly balanced, which makes search time greater than the theoretical 0(log(2)n). Secondly, the effort required for re-balancing is significant and difficult to quantify. Nodes are inserted as leaves at the bottom of the tree, and operations known as rotations may then be applied in order to restore some semblance of balance. These rotations are difficult to understand and appear to be quite compute-intensive. While the number is bounded for node insertion, node deletion may result in many rotations that propagate throughout the tree. It has been suggested that the processing necessary for re-balancing due to node deletions be collected into a background task that is invoked when necessary. This introduces an asynchronous characteristic into the re-balancing effort that could conceivably be quite troublesome.
Since the maintenance activities described above can effectively block new searches of the tree, some of the benefit of the fast lookup is lost. For this reason, the search time of a Balanced Binary Tree can more accurately be described as 0(log(2)n+(delta), where (delta) is the additional overhead that results from the two factors described above, i.e., tree imbalance and re-balance processing.
The Balanced Binary Tree techniques, while academically elegant, may be viewed as being more complex and may incur much more compute overhead for tree node deletions. As an example, cached controllers are expected to be dynamically deleting as well as adding cache index entries to their cache data structures. This extra overhead needed for deletions (a non-deterministic amount of processing requirements) does not fair well in a storage controller with real-time constraints.