In some fields of computing, it is often necessary to transfer a rectangular block of bits from one memory location to another. For example, in video gaming, a block of bits representing an image or surface may be transferred from system memory into video memory for display to a user. Such a data transfer is commonly referred to as a bit block transfer (bitblt). In video gaming, a bitblt should be very fast so that graphics presentation is smooth and devoid of noticeable delays to the user. In a computer that lacks sophisticated graphics hardware, such as an accelerated graphics card, bitblts are often performed by software, such as emulation software, which emulates graphics hardware.
Bitblt functions often involve not only a data block transfer, but also an operation performed on the data. For example, while the data block is being transferred from one memory location to another, a transparency operation may be applied to the data block. Other operations, such as raster operations (ROPs), stretching, shrinking, alpha-blend, and color-conversion may be performed as well, depending on the situation. Any combination of operations may be required during a bitblt.
A bitblt software developer typically develops code to handle all the bitblt operations that may be required. Unfortunately, traditional approaches to developing bitblt software functions are fraught with difficult trade-offs between code size and code performance. Two general approaches include: (1) writing many (e.g., hundreds) functions, one for each bitblt operation, to achieve optimal performance of software bitblts in all situations, or (2) writing a single (or a small number of) bitblt function(s) that can perform any bitblt by branching to the correct bitblt operations within the function. The first approach results in fast bitblt performance, but extremely large code size. The second approach results in a reasonable code size, but much slower bitblt performance because of overhead involved with branching to the correct bitblt operations.
In addition, a traditional bitblt function does not allow for runtime optimization. The traditional bitblt function is typically coded and compiled into machine code for a particular target platform (e.g., microprocessor) prior to deployment in a system. While the traditional bitblt function, as a whole, may be optimized for speed (or size) for the target platform at compile time, once compiled, particular operations (e.g., ROPs, transparency, etc.) within the traditional bitblt function cannot be further optimized for speed after deployment. When a single bitblt function is used, and branches are made to a specific bitblt operation, the branching can result in sub-optimal performance for the specific operation.