Currently, a number of data processing and computing tasks are implemented depending on dedicated processing resources. For example, a graphics processing unit (GPU) is a known dedicated processor which is applied to a computer, a work station, a game console and a mobile device for accelerated calculation. Using the GPU and a central processing unit (CPU) simultaneously, GPU accelerated calculation can be provided. The GPU accelerated calculation can transfer a computation-intensive workload of the application program to the GPU, while the remaining program code is still executed by the CPU. From a user's perspective, the operating speed of the application program is remarkably accelerated.
Nowadays, in order to provide better service to a computation-intensive task, such as high-performance computing (HPC), machine learning (ML), deep learning (DL) or the like, a GPU instance is deployed in more and more public clouds or data centers, for use by these computation-intensive tasks. The GPU instance deployed in these public clouds or data centers is shared by applications of different tenants. However, it is only a simple attempt to deploy the GPU instance in these public clouds or data centers, which has been at an initial phase so far. Up to now, there still lacks a solution of controlling quality of service (QoS) of resources in the shared GPU among a plurality of applications. Other types of dedicated processing resources are also confronted with a similar problem.