Internet resources often comprise interactive services, such as a search engine, a map service, online gaming and/or video-on-demand. Search engines receive queries from Internet users and return search results (e.g., a fixed number of top ranked documents that match each query) within a pre-defined deadline. In a video-on-demand environment, users access various multimedia, such as video clips (i.e., media streams). Because transmitting the video clips utilizes various computing resources to maintain a certain level of user experience, servers communicate high quality video clips streams within a requested deadline only when lightly or moderately loaded.
Such interactive services consume a significant portion of computing resources, such as processor cycles, network bandwidth, Input/Output (I/O), storage capacity and/or the like. Accordingly, the interactive services require a system of servers for processing requests, such as indexing servers for responding to queries for the search engine. Interactive service providers desire short, predictable response times for requests while reducing operational costs. To reduce cost, it is desirable to operate the servers at high resource utilization rather than using many lightly loaded servers handling the same load, which saves hardware, energy and maintenance costs. To achieve short and predictable response times, the interactive services keep average computing resource utilization low. As servers become busy, queuing delays increase and requests miss their deadlines, resulting in degradation in service quality. Such a degradation results in poor user experience and revenue loss. The resource utilization is kept low because the servers cannot support a good quality of service when the servers are overloaded or approaching an overload.
In contemporary systems, contemporary schedulers associated with the system for scheduling the requests do so in a way that provides a complete response, or rejects the request when a deadline cannot be satisfied, e.g., when the system is overloaded with requests. In this scheme, no response is provided if it cannot be completed by the deadline. As a result, the system is unable to deliver a high response quality and/or cannot consistently maintain a high resource utilization rate when heavily loaded.