In embedded integrated circuit applications, such as automotive applications, embedded devices include display controllers for, say, infotainment, instrument cluster displays, etc. In order to enable dynamic content creation, with minimal central processing unit (CPU) intervention, it is known for such embedded display controller units (DCUs) to read (e.g. fetch) image data for individual graphics layers so that they can be displayed directly from memory and thereby blend the image data on-the-fly.
Graphical images displayed by automotive infotainment and instrument cluster displays are typically made up of a plurality of graphic layers that are combined (often referred to as ‘blended’) together in order to generate a composite image that is displayed to an end user. Accordingly, the embedded DCUs mentioned above fetch pixel data for multiple graphics layers, blend the pixel data for the multiple graphics layers to generate pixel data for the composite image to be displayed, and output the generated composite pixel data to a display device, all ‘on-the-fly’.
Such embedded DCUs are typically implemented as hardware, and include a memory interface component that supports a plurality of data channels, each data channel arranged to receive pixel data relating to one graphics layer and store the received pixel data within a respective input (First-In-First-Out) buffer. The DCU may then perform functions, such as format conversion, blending, gamma correction, etc., ‘on-the-fly’ in order to generate composite pixel data to be output to a display device. Blending is performed over multiple graphical surfaces (for example, multiple picture/pixel rectangles) in order to form a single image for a display. It is known that the DCU may blend multiple surfaces simultaneously and may be used to off-load the blending from other processing units (such as, for example, a GPU).
One limitation is that the DCU is able to only blend a small number of simultaneously overlapping layers. Another limitation is that if there are more surfaces than available layers, the excess surfaces cannot be blended on the DCU (which is limited on the number of layers that it can support) and must therefore be blended by other processor units. In many such architecture, the DCU is only used to present the final, already-blended image on the screen. For architectures that make use of a different processing unit for blending, for example a GPU, the inventors have recognized and appreciated that the GPU is unable to write to a pixel from multiple sources of data at the same time, and it therefore has to render sequentially each surface onto the frame buffer. The time it takes for the DCU to write a pixel is constant when the number of layers increases (up to a maximum supported), but a GPU must access each layer in part and then write the combined result. Since the GPU is also used by applications to actually fill the surfaces, it becomes a major performance bottleneck, thereby causing contention between the various applications and the DCU (sometimes referred to as a ‘compositor’).
A number of solutions have been attempted to address the problem of blending and rendering of each surface of an image on to a frame buffer. US20120117508A1 describes a technique that proposes a window manager for embedded systems that only use a GPU for blending. This limits the performance of the window manager to that of the GPU and prevents the GPU from running other tasks. U.S. Pat. No. 6,700,580B2 describes a rendering system that uses multiple pipelines to generate multiple frame buffers that are then blended by a compositor. This setup subsequently restricts the applicable platforms (as they must provide multiple GPUs) and also restricts the number of surfaces. Furthermore, the compositor only blends using color averaging. A more flexible and less complex solution is therefore needed.