A multi-tenancy architecture refers to a principle in software architecture in which a single instance of the software such as an electronic commerce (ecommerce) web site running on a server machine can serve multiple client organizations (tenants). Although tenants in such a multi-tenant computing environment share the same ecommerce web site, they do not share or see each other's data. Furthermore, each tenant has a “sandbox” of limited resources defined by a number of constraint policies. Such policies may be defined due to infrastructural limitations and/or for business reasons. Each tenant may have one or more applications running on the ecommerce web site. These applications are limited to the resources defined by the policy or policies for the particular tenant.
When multiple applications associated with a tenant require access to resources through an application programming interface (API), the multi-tenant platform supporting the ecommerce web site needs to be able to limit and distribute the API requests in such a way that it enforces the policy or policies and equitably distributes quota across all requesting applications.
One solution to address this need is to set a hard limit quota on the shared infrastructure over a fixed period of time. However, this solution can cause problems when a high throughput client consumes the API and exhausts the hard limit in a short period of time. This sort of bursting behavior puts a significant amount of load in a shared infrastructure and may affect other tenants. An example of such a usage pattern is shown in FIG. 1.
In a networked environment, most systems have used some sort of time based rate limiting algorithms to throttle requests and access rates. Examples of time based rate limiting algorithms include the leaky bucket and token bucket algorithms. However, time based rate limiting algorithms are not without their drawbacks. Consequently, there is room for innovations and improvements.