Through the Internet and other networks, users have gained access to large amounts of information distributed over a large number of computers. In order to access the vast amounts of information, users typically implement a user browser to access a search system. The search system responds to an input user query by returning one or more sources of information available over the Internet or other network.
Online services build search systems designed to handle expected user load. The search systems are typically built to a capacity to exceed the expected amount of legitimate load in order to provide some level of buffer. However, the higher the capacity, the greater the cost and complexity of the system, so it is highly desirous to not provide more capacity that will realistically be necessary.
Illegitimate load, which may be defined as load on a search system that is not serving the business purposes of the system, can cost a large amount of money to support and can also cause system performance problems. A particular type of illegitimate load is termed a Denial Of Service (DOS) attack. A DOS attack is artificially generated load explicitly developed to disrupt service of legitimate users.
Illegitimate load may raise costs by forcing a search system to repeatedly access more expensive resources in a time-consuming manner. For example, search systems typically store results for popular queries in a cache that is easily and quickly accessed. An attacker may generate multi-word randomized queries for which the search system will not have results stored in the cache, thus forcing the search system to access an index system for each query. The index system may include at least one small index and at least one large index. Accessing the large index will typically require the largest amount of computation and cause the search system to incur the greatest expense. A skilled attacker may formulate queries for which the search system will have to access the larger index.
In some search systems, a substantial portion of all search requests may occupy more than five hundred machines for a period of time. Across many computers, ten to one hundred computer minutes may be required for a single request that is well-crafted to require computation.
Generally, an extraordinarily high number of search queries may indicate an attempt to cause system disruption through a DOS attack. The DOS attacks may fall into two categories. A malicious attack may occur when a user attempts to bring down or reduce the capacity of a site for a malicious purpose such as a financial purpose. A non-malicious attack may occur when a user dominates the search system for non-malicious reasons. For instance, a researcher may implement multiple computers at a very high rate to research a topic through a search system.
It is also possible that this abnormally high number or search queries is not a synthetic event such as a DOS attack, but is rather the result of a natural event causing the abnormally high traffic. Accordingly, a solution is needed that determines whether a number of queries is abnormal and further determines whether a high traffic event is natural or artificial.