The subject matter herein relates to computer algorithms for in-place searching in arrays. Arrays are among the oldest and most important data structures, and are used by a majority of computer programs. Arrays effectively exploit the addressing machinery of computers. In most modern computers (and in many external storage devices), the memory is a one-dimensional array of words, and the array's indices form the addresses of the words. Processors, especially vector processors, are often optimized for array operations, so having data in the form of arrays typically speeds up the processing significantly.
Arrays are useful mostly because the element indices can be easily computed at run time. This feature allows, for example, a single iterative statement to process arbitrarily many elements of an array. Arrays can be sorted or unsorted. The most common way to store data in arrays is to store the data in an unsorted array. As was discussed above, one of the main advantages with arrays is that that they allow fast access to elements by index and this access is independent of the contents of the array. For example, imagine that a list of people in the office is created by putting the name of each person on one line and assigning consecutive numbers to these lines. It will be very quick and easy to find the name that is associated with number 27. However, it will be difficult to find a person with the last name Smith. To allow the second operation to be faster, it would be desirable to sort people in the list alphabetically.
In computer science, a selection algorithm is an algorithm for finding the n:th smallest number in a list (such a number is called the n:th order statistic). This includes the cases of finding the minimum, maximum, and median elements. Selection is a sub-problem of more complex problems like the nearest neighbor problem and shortest path problems. Often selection can be reduced to sorting by sorting the list and then extracting the desired element. Another general application is to show the “Top-n elements” of an array without sorting the entire array. This can be done by finding the n:th number, selecting numbers that precede the n:th number and sorting only these n numbers. This can speed up things significantly if the array contains thousands of elements and only the top-10, say, should be displayed.
As is well known to computer programmers, the O( ) notation is frequently used in the analysis of algorithms to describe an algorithm's usage of computational resources. The worst case or average case running time or memory usage of an algorithm is often expressed as a function of the length of its input, using the O( ) notation. For example, O(N) indicates that the number of operations increases with N at a rate that is less than or equal to some linear function of N, where N is the size of the array, while O(N^2) indicates that the number of operations increases with N at a rate that is less than or equal to some square function of N, etc. This allows algorithm designers to predict the behavior of their algorithms and to determine which one among multiple algorithms to use in a way that is independent of specific computer architectures or clock rates.
Current algorithms for searching an unsorted array, such as the partition-based general selection algorithm “nth_element” in STL (Standard Template Library), which is a standard library for C++ developers, has an O(N) average performance and an O(N^2) worst performance. It is also well known that such algorithms are subject to attack when an attacker supplies specially fabricated arrays to increase load on server in Denial-of-Service (DOS) attacks. There are also O(N) worst case algorithms, such as the one discussed in “The Art of Computer Programming” by Donald Knuth (Addison-Wesley Professional; 2nd edition, Oct. 15, 1998), in which a median is found by looking for medians of 5-plets and then looking for medians of medians. Unfortunately, this algorithm's implementation either requires O(N) external storage, or significant data copying. In addition, the complexity of this algorithm involves a large scale factor. Thus, it is preferable to use this algorithm only in the case of very large N.
One limitation of the above algorithms is that they are designed to look for specific quantiles only, such as the median. They could be applied to generic nth-element search by using subdivision of the segment by 2 which, however, would increase the scale factor of the algorithm complexity by another factor of 2.