Current web applications are often deployed in a distributed manner in a cloud environment. They are often scalable in that the number of instances of a web application may increase as the workload increases. An instance of a web application may be shared between many users, or between many tenants that each have one or more users. In such a cloud environment, a single user, a tenant, or a task can easily cause a denial of service for other users and/or tenants, e.g., by simultaneously or near simultaneously consuming too many resources such as CPU time, network bandwidth, memory, and/or the like. Such occurrences of denial of service can be relatively frequent because a tenant's resource needs can fluctuate, the mix of work requests handled by the system can change from moment to moment, etc.
In order to avoid such situations, many systems implement resource use limitation techniques. For instance, in many current cloud-based systems, resource use limits are realized using a simple token bucket algorithm. A token bucket algorithm is based on the idea of a fixed capacity bucket into which tokens, usually representing a work package (e.g., units of CPU time, units of network bandwidth, units of memory etc., used when performing a service), are added at fixed rate. Each time a service is requested by a user, the service is checked for conformance to the defined bucket limits. The bucket is inspected in order to guarantee that it contains a sufficient number of tokens to process the request. If the bucket contains a sufficient number of tokens, then the tokens are removed from the bucket and the request is served. If the bucket is empty or contains an insufficient number of tokens, then the request is processed only partially or possibly not at all. There are various ways to handle a request that does not receive the required number of tokens such as, for example, queuing the request until a sufficient number of tokens has accumulated, serving the request partially, dropping the request, etc.
The token bucket algorithm is well known and commonly used in hardware routers and networking software. However, the token bucket algorithm may not be suitable for use in a distributed environment where multiple applications or nodes are serving requests in parallel. The solutions available on the market today use either a single bucket for all the servers (e.g., web application instances), a dedicated bucket for each server, or some form of hierarchical buckets. Yet none of these approaches satisfies the requirements of a distributed, multi-tenant aware cloud environment. FIG. 1 and FIG. 2 illustrate example token bucket implementations in some current cloud environments.
In FIG. 1, each server 102 and 103 is provided with its own token bucket 104 and 105, respectively, in a cloud environment 100. The Apache HTTP Server™ provides limited resource use limitation capabilities in which each server maintains its own independent tokens, as illustrated in FIG. 1. Each bucket 104 and 105, and hence the workload for each server 102 and 103, is managed independently in the scenario shown in FIG. 1.
The IBM WebSphere Telecom Web Services Server™ and Amazon Web Services™ provide rate limitation in a manner similar to that shown in the cloud environment 200 of FIG. 2. FIG. 2 is a high level illustration of an implementation where two servers—servers 202 and 203—each has its own token bucket (token buckets 204 and 205), which is supplied with tokens from a global token bucket 206. The global token bucket 206 enables resource limitation at the level of a container, which is the execution environment common to both servers 202 and 203, rather than at the individual server level.
The techniques illustrated in FIGS. 1 and 2 provide for resource limitation at the server level, but they unfortunately are not suitable for distributed multi-tenant cloud environments. Neither technique, for example, provides for limiting resources at a tenant or user level. In a multi-tenant cloud environment, an application instance is shared by plural tenants, often with each tenant requiring a dedicated share of the instance.
Thus, it will be appreciated that it would be desirable to improve on these techniques, e.g., to provide for dynamic resource use limitations in a cloud computing environment.
An example embodiment includes a method for limiting usage of resources in a distributed computing environment. The method includes receiving, in connection with a first application process of a plurality of application processes executing in the distributed computing environment, a service request from a user. A resource strategy is generated in connection with the first application process. The resource strategy is based on the received service request, and specifies at least one resource shared by the plurality of application processes and an amount of the at least one resource for use by the first application process to subsequently perform a service requested in the service request. The method also includes determining in connection with a resource controller process different from the first application process whether the generated resource strategy is feasible, and either (a) performing the service, when the determining determines that the resource strategy is feasible, or (b) revising the resource strategy and re-submitting the revised resource strategy to the resource controller process when the determining determines that the resource strategy is not feasible.
According to some example embodiments, the method may further include ensuring revision of the resource strategy and re-submitting by one of the plurality of application processes the revised resource strategy to the resource controller process, and subsequent to a determination by the resource controller process that the revised resource strategy is feasible, performing the service in accordance with the revised resource control strategy.
According to some example embodiments, the method may further include configuring a hierarchy of token buckets, the hierarchy having at least three levels and a total number of tokens in the token buckets corresponding to a maximum capacity of the at least one resource, and distributing the tokens in accordance with a predetermined allocation of the at least one resource to a plurality of users. The determining may include determining that the resource strategy is not feasible based on a number of tokens in a token bucket corresponding to the user.
According to some example embodiments, the method may further include configuring a hierarchy of token buckets, with the hierarchy having at least three levels and a total number of tokens in the token buckets corresponding to a maximum capacity of the at least one resource, and distributing the tokens in accordance with a predetermined allocation of the at least one resource to a plurality of users. The hierarchy comprises a global level token bucket at the highest level, a plurality of tenant level token buckets at an intermediate level with each tenant of the distributed computing system having a corresponding tenant level token bucket, and a plurality of user level token buckets at the lowest level with each of the plurality of users having a corresponding user level token bucket. The method may further include performing the service and consuming a number of said tokens corresponding to the amount of the at least one resource from the user level token bucket corresponding to the user.
According to some example embodiments, the the performing of the service may include locking the user level token bucket corresponding to the user at others of the plurality of application processes before accessing the at least one resource, using the at least one resource, and synchronizing the user level token bucket corresponding to the user at others of the plurality of application processes to update a status of the user level token bucket corresponding to the user after the use. The updated status includes reducing a number of tokens in the user level token bucket corresponding to the user by a number of the consumed tokens.
The determining may include determining that a number of tokens in the user level token bucket corresponding to the user equals or exceeds a number of tokens corresponding to said amount of the at least one resource for use by the first application process.
The determining may further include determining that the user level token bucket corresponding to the user is not locked by another of the plurality of application processes.
The distributing may include distributing the tokens in accordance with a predetermined allocation of the at least one resource to said each tenant and said plurality of users.
According to some example embodiments, the determining may include determining by the resource controller process different from the first application process that the generated resource strategy is not feasible. The method may further include: annotating the generated resource strategy to include information regarding an amount available of the at least one resource; returning, by the resource controller process to the first application process, the annotated resource strategy; revising the generated resource strategy based on the annotated resource strategy; and re-submitting the revised resource strategy to the resource controller process.
The revising may include specifying a reduced amount of the at least one resource, with the reduced amount being determined based on an estimated minimum amount of the at least one resource required for the service.
According to some embodiments, the plurality of application processes consists of instances of a same application.
An example embodiment includes a system for limiting usage of resources in a distributed computing environment, the system comprising a plurality of processing systems communicatively connected by a network, each comprising at least one processor. The plurality of processing systems being configured to at least: receive, by a first application process of a plurality of application processes, a service request from a user; generate, by the first application process, a resource strategy based on the received service request, the resource strategy specifying at least one resource shared by the plurality of application processes and an amount of the at least one resource for use by the first application process to subsequently perform a service requested in the service request; determine by a resource controller process different from the first application process whether the generated resource strategy is feasible; and perform one of (a) the service when the determining determines that the resource strategy is feasible, and (b) revision of the resource strategy and re-submission of the revised resource strategy to the resource controller process when the determining determines that the resource strategy is not feasible.
The distributed computing environment may include a multi-tenant cloud computing environment.
The example system includes revising the resource strategy and re-submitting by one of the plurality of application processes the revised resource strategy to the resource controller process, and subsequent to a determination by the resource controller process that the revised resource strategy is feasible, performing the service in accordance with the revised resource control strategy.
The plurality of processing systems of the example system may be further configured to: configure a hierarchy of token buckets, the hierarchy having at least three levels and a total number of tokens in the token buckets corresponding to a maximum capacity of the at least one resource; and distribute the tokens in accordance with a predetermined allocation of the at least one resource to a plurality of users. The hierarchy may comprise a global level token bucket at the highest level, a plurality of tenant level token buckets at an intermediate level with each tenant of the distributed computing system having a corresponding tenant level token bucket, and a plurality of user level token buckets at the lowest level with each of the plurality of users having a corresponding user level token bucket. The performing may comprise performing the service and consuming a number of said tokens corresponding to the amount of the at least one resource from the user level token bucket corresponding to the user.
The processing systems of the example system may be configured to perform the service by: locking the user level token bucket corresponding to the user at others of the plurality of application processes before accessing the at least one resource; using the at least one resource; and synchronizing the user level token bucket corresponding to the user at others of the plurality of application processes to update a status of the user level token bucket corresponding to the user after the use, wherein the updated status includes reducing a number of tokens in the user level token bucket corresponding to the user by a number of the consumed tokens.
According to some example embodiments, the processing systems are configured to determine, using the resource controller process different from the first application process, that the generated resource strategy is not feasible. They may be further configured to: annotate the generated resource strategy to include information regarding an amount available of the at least one resource; return, by resource controller process to the first application process, the annotated resource strategy; revise the generated resource strategy based on the annotated resource strategy; and re-submit the revised resource strategy to the resource controller process.
Another example embodiment includes a non-transitory computer readable storage medium having stored thereon instructions which, when executed by at least one processor of a plurality of processing systems in a distributed computing environment, causes the plurality of processing systems to at least perform a set of operations. The set of operations includes receiving, by a first application process of a plurality of application processes, a service request from a user; generating by the first application process a resource strategy based on the received service request, the resource strategy specifying at least one resource shared by the plurality of application processes and an amount of the at least one resource for use by the first application process to subsequently perform a service requested in the service request; determining by a resource controller process different from the first application process whether the generated resource strategy is feasible; and performing one of (a) the service, when the determining determines that the resource strategy is feasible, and (b) revision of the resource strategy and re-submission of the revised resource strategy to the resource controller process when the determining determines that the resource strategy is not feasible.
According to some example embodiments, the performing includes revising the resource strategy and re-submitting by one of the plurality of application processes the revised resource strategy to the resource controller process, and subsequent to a determination by the resource controller process that the revised resource strategy is feasible, performing the service in accordance with the revised resource control strategy.
According to some example embodiments, the instructions further cause the processing systems to: configure a hierarchy of token buckets, the hierarchy having at least three levels and a total number of tokens in the token buckets corresponding to a maximum capacity of the at least one resource; and distribute the tokens in accordance with a predetermined allocation of the at least one resource to a plurality of users. The hierarchy may include a global level token bucket at the highest level, a plurality of tenant level token buckets at an intermediate level with each tenant of the distributed computing system having a corresponding tenant level token bucket, and a plurality of user level token buckets at the lowest level with each of the plurality of users having a corresponding user level token bucket. The performing may include performing the service and consuming a number of said tokens corresponding to the amount of the at least one resource from the user level token bucket corresponding to the user.
These aspects, features, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.