The present disclosure relates to computing and data processing, and in particular, to throttle control on cloud-based computing tasks.
A multi-tenancy cloud system often serves multiple customers at the same time, and service requests (also referred to herein as computing tasks or jobs) are sometimes scheduled by a centralized scheduling system for execution.
To prevent potential abuse (e.g., duplicitous requests leading to denial of service (DOS)), unintended or otherwise, throttle control is important in such a multi-tenancy cloud system. Not only should the total number of service requests submitted into a waiting queue be controlled (e.g., capped), but also the total number of service requests being serviced on a per customer basis.
Difficulties abound, however. One technical problem is that, without distinguishing the source of a service request, an unintended preferential treatment of service requests from certain customers may occur, which may cause dissatisfaction among other customers. For example, if a cloud system continuously allocates computing resources to one particular customer over another customer, the other customer may become underserved or even experience a service outage.
Another technical problem is that, without distinguishing the category (or type) of a service request, an unintended preferential treatment of service requests of a particular category may occur, which may cause performance deterioration to other categories of services requests. For example, if a cloud system continuously allocates computing resources to larger scale data write requests (which may be more time- and resource-consuming than read requests), smaller scale data read requests may be kept pending and even become timed out, even though fulfilling these read request would not otherwise impact overall system performance to a noticeable degree.
There is therefore a need for improved techniques for providing throttle control on cloud-based computing tasks.