With development of general GPU (Graphics Processing Unit) technologies, a GPU not only can process image load but also can process a particular type of general program. Currently, when multiple different kernel programs need to access a GPU, generally, the kernel programs requesting to access the GPU access the GPU one by one in a serialized manner according to a time order of sending requests. If a kernel program having a very long delay is occupying a GPU, when a kernel program having a higher priority needs to access the GPU, the kernel program having the higher priority can access the GPU only when the kernel program that is accessing the GPU and a kernel program that is waiting to access the GPU have completed running and then SM (streaming multiprocessor) resources in the GPU are released. Consequently, the kernel program having the higher priority is not responded to in a timely manner and service quality is affected.
To prevent a kernel program having a long delay from exclusively occupying the SM resources in the GPU for a long time, when a kernel program having a high priority needs to access the GPU, an idle SM may be searched for. When an idle SM is found, the kernel program having the high priority is distributed to the idle SM for running.
However, if the GPU has no idle SM, the kernel program having the high priority can start to run only when an idle SM occurs in the GPU. Consequently, the kernel program having the high priority is not responded to in a timely manner.