Computers and other computational devices typically have at least one programmable processing element that is generally known as a central processing unit (CPU). Such devices frequently also have other programmable processors that are used for specialized processing tasks of various types, such as graphics processing operations, and hence are typically called graphics processing units (GPUs). GPUs generally comprise multiple cores or processing elements designed for executing the same instruction on parallel data streams, making them more effective than general-purpose CPUs for algorithms in which processing of large blocks of data is done in parallel. In general, a CPU may function as a “host,” i.e., setting up specialized parallel tasks and then handing them off to be performed by one or more GPUs.
Although GPUs were originally developed for rendering graphics and remain heavily used for that purpose, current GPUs support a programming paradigm that allows for the use of GPUs as general-purpose parallel processing units, i.e., in addition to being used as graphics processors. This paradigm allows implementation of algorithms unrelated to rendering graphics by giving access to GPU computing hardware in a more generic, non-graphics-oriented way.
Several frameworks have been developed for heterogeneous computing platforms that have CPUs and GPUs. These frameworks include the METAL framework from Apple Inc., although other frameworks are in use in the industry (METAL is a trademark of APPLE INC.). Some frameworks focus on using the GPU for general computing tasks, allowing any application to use the GPUs' parallel processing functionality for more than graphics applications. Other frameworks focus on using the GPU for graphics processing and provides application programmer interfaces (APIs) for rendering two-dimensional (2D) and three-dimensional (3D) graphics. The METAL framework supports GPU-accelerated advanced 3D graphics rendering and data-parallel computation workloads.
Various tasks may be offloaded from a host (e.g., CPU) to any available GPU in the computer system. One type of task, in particular, that may be performed by GPUs is known as a “blit” operation. The term blit refers to the operation of copying a region of a texture object onto another texture of the same format or from/to a memory buffer. As described herein, blit operations may be performed entirely by a GPU, with a minimum setup cost on CPU. Some frameworks support the following kinds of blit operations: texture-to-texture; texture-to-buffer; and buffer-to-texture.
Buffer objects, as described herein, are handled internally by the graphics hardware as textures with a one-dimensional, i.e., linear, memory layout. The GPU drivers may thus create a “texture view” of the buffer that is compatible with the size of the blit range requested by a developer and/or calling application, which allows the GPU to implement texture-to-buffer and buffer-to-texture blit operations as texture-to-texture blits.
For non-multisampled textures (i.e., textures wherein only a single color sample is stored per pixel), one or more GPU drivers may simply set up the GPU to implement the texture-to-texture blit operation as a fragment shader program that reads in from the source texture and writes out to the destination texture. For texture-to-buffer blits, the destination texture may be thought of as an alias of a linear buffer.
For multisampled textures (i.e., textures wherein more than one color sample is stored per pixel), however, texture-to-buffer blits may not presently be implemented as mentioned above with reference to non-multisampled textures, e.g., because present GPUs do not support writing multisampled surfaces with a linear memory layout. Moreover, because of various hardware limitations, it is not presently possible to implement buffer-to-(multisampled) texture blits with this approach either.
Thus, techniques are needed to handle certain situations, e.g., blits of multisampled textures, wherein the destination buffers are too large to be aliased by an equivalent non-multisampled texture view. Appropriately handling such situations on the GPU will allow developers and/or calling applications to seamlessly execute texture-to-buffer blit copy operations on large, multisampled textures. Such techniques are also preferably computationally efficient and respect the developer's use of padding in source textures.