Computer systems typically include one or more coprocessors. For example, a graphics processing unit (GPU) is an example of a coprocessor that performs specialized processing of tasks to which it is well suited, freeing the host processor to perform other tasks. In some cases, a coprocessor may reside on the system's motherboard with a central processing unit (CPU), such as a microprocessor, and in other systems a coprocessor may reside on a separate graphics card. A coprocessor often accesses supplemental memory, for example, video memory, in performing its processing tasks. Some coprocessors are optimized to perform three-dimensional graphics calculations to support applications such as games and computer aided design (CAD). While current computer systems and coprocessors perform adequately when running a single graphically intensive application, they may experience problems when running multiple graphically intensive applications.
One reason for this is the typical coprocessor's inability to efficiently schedule its workload. In current operating systems, the GPU is multitasked using a cooperative approach (i.e., each application submits operations to the GPU driver which serializes and executes them in the order they were received). This approach does not scale well when many application with differing priority access the same resources. With cooperative multitasking, an application currently “controlling” the coprocessor must relinquish control to other applications in order for those other applications to achieve their coprocessing objectives. If the application fails to relinquish control, e.g., because the work request it has submitted to the coprocessor is voluminous or for some other reason, it can effectively “hog” the coprocessor. While this has not been a significant concern when running a single graphically intensive program, the problem of hogging the coprocessor can become more serious when multiple applications attempt to use a coprocessor. One can only imagine being required to wait 10 seconds or more for the mere rendering of a mouse movement to appreciate that hogging of the coprocessor by an application introduces undesirable eventualities in the computing environment. It would thus be desirable to have more efficient scheduling of coprocessor resources.
While the problem of apportioning processing between operations has been addressed in the context of a CPU, where sophisticated scheduling of multiple operations has become necessary, scheduling for coprocessors has not been effectively addressed. This is because the coprocessor, in present day systems, is generally seen as a resource to divert calculation-heavy and time consuming operations away from the CPU, providing the CPU with more processing time for other functions. Such calculation-heavy operations are often graphics operations, which are known to require significant processing power. As the sophistication of applications increases, they often require greater reliance on the coprocessor to handle robust calculation and rendering activities. This increased reliance, in turn, creates an as-yet unforeseen need to surmount the technical barriers involved in intelligent apportioning of coprocessor resources. For these and other reasons, systems and methods for efficiently scheduling coprocessor tasks and other use of coprocessor resources are desired. It is further desirable to provide intelligent scheduling of coprocessor resources using existing coprocessors and existing hardware architecture, i.e., without redesigning the coprocessor with an eye towards multi-tasking.
In more detail, as illustrated in FIG. 1, in today's graphics systems, scheduling is generally handled as follows. Applications, such as application A, application B and application C submit work to a driver D via a mutex M that effectively only allows one application to communicate to driver D at a time, behaving as a lock on the driver D. Driver D maintains the state S_A, S_B and S_C of applications A, B and C, e.g., information about a texture, a render target, lighting, z-buffering, compression, etc. As GPU work requests are received from the applications, they are placed in a buffer. If a switch between applications occurs as part of a work request, then the state for the new application is restored prior to submitting the work to the GPU. In this fashion, while the GPU is unaware of the operation of the applications, multiple applications can still request GPU resources. However, the present system is “first come, first serve” which can be a problem. Some work requests are higher priority than others, and an application that submits a lot of work that takes, e.g., 10 seconds, to complete will snub all other applications' work requests during that time period. Accordingly, a system that utilizes existing hardware, but avoids the problems associated with “hogging” of the GPU by an application is desired.