This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Some operating systems have an abstraction layer that hides the complexity of graphics hardware from those exploiting that hardware. These abstraction layers may use various constructs to make programming easier. A render graph (referred to herein as a RenderGraph) is a data structure designed to aid in the computation of image processing operations. It is a graph structure whose nodes, called render actions, each represent one operation executed by the underlying graphics library. These actions are executed by traversing the graph, producing the contents of the frame buffer that is then displayed to the user.
In order to be performant when miming on a GPU (Graphics Processing Unit), the RenderGraph is typically compiled into a single GPU shader. Such a GPU shader may be called an uberKernel. A RenderGraph may also be compiled to run on a CPU, and in this case the uberKernel is a CPU shader or CPU code. In the instant context, the term uberKernel means, e.g., an assemblage of kernel invocations, where “uber” means “over” as this is the controlling code that calls other code. With regard to a GPU shader example, the uberKernel contains kernel code to invoke the image processing filters that the RenderGraph invokes from the framework.
A difficulty arises because the execution sequence within the uberKernel can change from frame to frame as a result of changes to, e.g., image filter parameters. For a concrete example, consider a HighlightAndShadow filter that reduces shadows and increases highlights. This filter takes two parameters, shadows and highlights, with values from 0.0 to 1.0. Depending on the values of these parameters, two (both shadows and highlights), one (either shadows or highlights) or zero (neither shadows nor highlights) invocations of filter processing kernels may be invoked by the RenderGraph. As a result, the uberKernel may need to invoke different kernel code on different frames. A problem that needs to be solved is how to allow different uberKernel control flow on different frames without burdening the runtime execution with unacceptable overhead.