Features of the invention relate generally to communications bandwidth utilization management and, more particularly, to management of bandwidth utilized by servers responding within acceptable time limits.
Conventionally, the request processing policy on single-machine server systems is to allocate computing cycles evenly among request-handling processes, and for those processes to complete as soon as possible. In multiple-server cluster systems (xe2x80x9cserver farmsxe2x80x9d), a load-balancing component typically distributes requests among machines evenly (or perhaps proportionately to their individual capacities), and then each machine executes the previous single-machine policy. Both cases optimize average response time, but potentially at the expense of other objectives. It would be desirable for a request processing method to exist that allowed other objectives to be optimized.
In situations where system operations cost is related to bandwidth utilization, it may be more desirable to optimize bandwidth utilization while keeping average response time merely acceptable. An example of this is the provision of servers for Internet services. Providers of such services typically buy bandwidth according to the xe2x80x9c95% rulexe2x80x9d: after discarding the top 5%, the highest remaining bandwidth utilization sample determines the billing rate. Thus, lowering peak bandwidth utilization can reduce operations costs. Since bandwidth costs can constitute some 40% of total operations costs, the savings can be substantial. Accordingly it would be advantageous for means to exist that allowed system operators to reduce bandwidth costs by lowering peak bandwidth utilization.
One conventional solution for reducing peak bandwidth utilization is to serialize the processing of requests. This technique does peak reduce bandwidth utilization, however it does not provide a way in which response time can be kept within acceptable limits. Conventionally, keeping the acceptable response time (xe2x80x9cARTxe2x80x9d) within acceptable limits involves allocation of bandwidth across request processes. However, this can be self-defeating if the overhead for such coordination is too high (e.g., which can happen when using interprocess communication), making it impossible to achieve ARTs. It is therefore desirable that means exist to efficiently reduce bandwidth utilization while maintaining ARTs.
Furthermore, ARTs can depend on the context of individual requests. If a user is waiting for the result of the request, response times should be quite short. However, if the request is to preload a document for potential future use, the response time can be quite large.
It is therefore desirable that a method, apparatus, and computer program product exist that allows for efficient bandwidth ultilization management while maintaining request response times within acceptable levels.
In order to provide these and other objectives, one illustrative aspect of the present invention are methods for managing bandwidth utilization by a server in fulfilling requests for resources. An exemplary method includes receiving a request for a resource; delaying fulfillment of the request by a predetermined time period; and thereafter fulfilling the request for the resource. Delaying fulfillment of the request may include generating a delay value, the delay value being less than an acceptable response time; and waiting for a time interval at least as great as the delay value to elapse. In an additional feature, the delay value is an element of a sequence distributed substantially uniformly between zero and the acceptable response time, and the sequence can be a pseudo-random sequence.
In yet another illustrative aspect, receiving a request for a resource may include receiving a request for a resource with a request dispatch process; determining a delay value, the delay value determined with the request dispatch process; and dispatching the request and the delay value to a request handling process for handling. The acceptable response time may be received with the request, and the request may include a path identifying the resource and the path may include the acceptable response time.
An additional illustrative aspect involves programmed instructions configuring a computing apparatus for managing bandwidth utilization by a server in fulfilling requests for resources. The programmed instructions configure the computing apparatus to provide structures implementing particular functions. One illustrative computing apparatus is configured to include a request receiver configured for receiving a request for a resource; a response fulfillment delayer configured for delaying fulfillment of the request by a predetermined time period; and a request handler configured for fulfilling the request for the resource. The response fulfillment delayer may include a delay value generator configured for generating a delay value, the delay value less than an acceptable response time; and a timer configured for waiting for a time interval at least as great as the delay value to elapse. The delay value may be an element of a sequence distributed substantially uniformly between zero and the acceptable response time and the sequence may be a pseudo-random sequence.
In an additional aspect, the request receiver may include an acceptable response time receiver configured for receiving a request for a resource with a request dispatch process; a delay value determiner configured for determining a delay value with the request dispatch process; and a request dispatcher configured for providing the request and said delay value to a request handling process. The acceptable response time may be received with the request and the acceptable response time may be determined by a port through which the request arrives. Additionally, the request may comprise a path identifying the resource and the path may comprise the acceptable response time.
A still further aspect illustrative of features of the invention is a computer program product comprising a computer-readable medium having computer readable instructions encoded thereon for server bandwidth utilization management. An illustrative computer program product includes computer program instructions configured to cause a computer to receive a request for a resource; computer program instructions configured to cause a computer to delay fulfillment of the request by a predetermined time period; and computer program instructions configured to cause a computer to fulfill the request for the resource after expire of the predetermined time period. The instructions to delay fulfillment of the request may include computer program instructions configured to cause a computer to generate a delay value, the delay value being less than an acceptable response time; and computer program instructions configured to cause a computer to wait for a time interval at least as great as the delay value to elapse. The delay value may be an element of a sequence distributed substantially uniformly between zero and the acceptable response time, and the sequence may be a pseudo-random sequence.
As an additional aspect, computer program instructions configured to cause a computer to receive a request for a resource may include computer program instructions configured to cause a computer to receive a request for a resource with a request dispatch process; computer program instructions configured to cause a computer to determine a delay value, the delay value determined with the request dispatch process; and computer program instructions configured to cause a computer to dispatch the request and the delay value to a request handling process for handling. The acceptable response time may be received with the request; the acceptable response time may be determined by a port through which the request arrives. Also, the request may include a path identifying the resource and the path comprises the acceptable response time.
A still further aspect illustrative of features of the invention is a method for managing bandwidth utilization by a server in fulfilling requests for resources including transmitting an identifier of a resource available on a server to a client, the identifier comprising an acceptable response time; receiving a request for the resource from the client; extracting the acceptable response time from the identifier; and transmitting a response to the client after the expire of the acceptable response time.