As a general-purpose GPU (Graphic Processing Unit, graphics processing unit) technology develops, a GPU can process not only image load, but also a general program of a specific type. Currently, when multiple different kernel programs need to access the GPU, the kernel programs that request to access the GPU usually access the GPU one by one by means of serialization according to a chronological order in which requests are sent. If a long-delayed kernel program is occupying the GPU, when a kernel program of a higher priority needs to access the GPU, the kernel program of a higher priority cannot access the GPU until the previous kernel program accessing the GPU and a kernel program waiting for accessing the GPU complete running and an SM (Stream Multiprocessor, streaming multiprocessor) resource in the GPU is released. Consequently, the kernel program of a higher priority cannot obtain a timely response, and quality of service is affected.
For preventing the long-delayed kernel program from exclusively occupying the SM resource in the GPU for a long time, when a kernel program of a high priority needs to access the GPU, an idle SM may be searched for, and when the idle SM is found, the kernel program of a high priority is distributed to the idle SM for running.
However, if the GPU has no idle SM, the kernel program of a high priority cannot be run until an idle SM appears in the GPU. Consequently, the kernel program of a high priority cannot obtain a timely response.