Debugging is a well-known process for finding the causes of undesirable operations in computer applications and modules. The undesirable operations may include, but are not limited to, unexpected behavior such as extended delays (“freezing”), unintended repetition (“looping”), unintended termination (“crashing”), or problems in the storage and/or manipulation of data, such as data discrepancies, memory faults, or anomalies. Typically the undesirable operations are caused by errors (“bugs”) in the application or module software.
In the case of computer graphics applications, the process of debugging may be made more complex by the use of heterogeneous computing systems that include both CPUs and GPUs. Additionally, debugging may be complicated by asynchronous processing on such systems, large datasets, and the need to have visibility into the complex state machines implemented by one or more GPUs. A frame debugger is a tool that allows users to inspect state/data at various points in a set of graphics frames with the intent of uncovering application bugs that produce incorrect rendering or other unintended behavior. Such bugs may be a result of program errors such as improperly configured state, incorrect operations sent to the GPU, corrupt data, or data hazards (often by consuming data before it has been produced). A frame debugger may capture (record) and replay the graphics operations generated by an application to enable such inspection.
The functionality provided by one or more GPUs or graphics systems is exposed using 3D application programming interfaces (APIs). Traditionally the runtimes and drivers that implement such APIs manage the complexity of potential data hazards internally, freeing the application developer from the need to worry about such complexity. A more recent industry practice has shifted the burden of resource management, data hazard management, and operation synchronization across processors to the application. This is done via APIs designed to expose such functionality.
A conventional mechanism for ordering or synchronizing operations with data dependencies across two or more processors (homogenous, heterogeneous, physical, logical, virtual, etc.) is to use synchronization objects or primitives. Such objects allow one processor to communicate with one or more other processors when a workload (set of tasks or operations) has completed. A fence object is an example of such a synchronization primitive. A processor can wait on a fence object, effectively blocking the processor from continuing any work, until the fence is signaled by another processor. A fence typically encapsulates a value that can be observed by processors, allowing the processors or application to make decisions about what workloads to execute based on the current progress made by other processors as indicated by the fence value. These kinds of synchronization primitives are exposed by modern 3D graphics APIs to aid in synchronizing work across CPUs and GPU engines.
Correct programming in a multi-processor environment is inherently complex. A set of bugs arising from incorrect fence usage includes, but is not limited to, data being consumed before it has been produced (no fence used or fence improperly used), less than optimal utilization of processors as a result of unnecessary fence waiting, processor hangs, and application or other system crashes. A graphics frame debugger that does not properly detect and replicate an application's use of fences will, at a minimum, have trouble replaying the application's sequence of events in a consistent and well-ordered way. Additionally, it will not be able to provide feedback to users about potential erroneous fence usage without accurately tracking fence operations.