The present invention is generally directed to optimizing the performance of searches. In particular, the present invention can be implemented to cause one or more searches to be delayed so that multiple searches can load segments of an index and search within them together.
In some computing environments, a server provides access to an index which can be searched. In such environments, clients may submit requests to the server for searching the index for specified content. In response, the server will perform the necessary operations to load segments of the index and then search within the loaded segments for the specified content. Under ideal conditions, the server will be capable of executing these searches in an acceptable amount of time. However, in many cases, the server may receive too many searches which may overload the server and cause its performance to suffer. For example, each time a search is executed, the server will be required to load each segment of the index resulting in a large number of disk operations and a large amount of memory consumption. Further, if the index happens to be stored in network storage, these loads will occur over the network which may result in the network becoming congested. When this overloading occurs, a search may be executed in an unacceptably slow manner or may even fail.
To address these overload scenarios, many systems may limit the number of concurrent requests. In such cases, if a client submits a request when the server is overloaded, the server may deny the request. Such denials extend the performance shortcomings to the client. Further, the denials can give the perception that the system is faulty or otherwise unsatisfactory.