As more and more media processors are coupled through a common memory system to share graphics content for performing graphics operations separately, it is becoming more and more expensive to allow one media processor, such as a graphics processing unit (GPU), to consume the result produced by another media processor. Usually, these media processors may be capable of handling 2D, 3D, Video and/or display command streams driven by API (application programming interface) calls from a graphics application. A host processor such as a central processing unit (CPU) is typically required to explicitly synchronize these graphics operations by stopping one media processor and starting another one when switching between APIs. Such synchronization is usually very costly and prevents simultaneous, parallel execution of a separate media processors.
In particular, a media processor may be driven by a system on a chip requiring a host processor included in the chip to respond to an interrupt signifying the completion of individual hardware graphics operations such as a single copy operation or a single solid color fill operation. However, responding to a single interrupt per operation can be expensive. Furthermore, interrupts may prevent simultaneous execution of media processors and a host processor. Consequently, the overall performance is degraded with lower parallelism.
On the other hand, multiple media processors and a display device coupled with a common memory system may require synchronization. For example, more than one component of a graphics content may arrive asynchronously from separate media processors to be displayed in a display device for a single application. Executing a graphics command to display the graphics content may depend on when each component is properly rendered and ready to display. Apparently, it is necessary to ensure maximum parallelism among multiple media processors to allow rendering different components for the same graphics content in a synchronous manner.
Additionally, parallel operations between a host processor and coupled media processors may be limited by a bottleneck introduced when deleting commonly shared graphics resources. Typically, media processor drivers ensure the media processors are idle prior to deleting graphics resources such as allocated memories, memory management unit (MMU) entries, the textures etc., that might otherwise be in use by pending graphics operations. This, however, prevents parallel operations by the host processor and media processors.
Furthermore, graphics rendering operations such as scaling may be limited by a fixed number of fractional precision in media processor hardware for arithmetic representations. Often, the bit-precision of the scale factor is limited to optimize mathematical operations inside the media processor hardware to allow a multiplication instead of a division which may be more expensive. For example, a scale factor may be inversely represented through a limited bit-precision fixed-point arithmetic. As a result, a limiting factor is introduced that certain scale factors cannot be represented accurately.