High resolution, full color digital images require more memory to represent them than is available in the main memory of a typical personal computer. This problem is compounded when more than one image is being accessed at the same time, such as, for example, when an imaging application is composing an image from more than one image. Most imaging applications, therefore, use some form of virtual memory to accommodate such images.
Virtual memory is a conventional memory management method that allows a computer system to present a larger memory region than is actually available in main memory. Operating systems typically implement virtual memory using a portion of secondary storage (e.g., memory on the hard drive) to augment physical memory and then mapping virtual memory addresses into physical memory addresses. Virtual memory is commonly managed in memory units called pages that are swapped in and out of physical memory as necessary to satisfy read and write requests. If an application program (application) attempts to access a piece of virtual memory which corresponds to data not currently in physical memory, the system issues a page fault. The operating system then instructs the hardware to swap in the page or pages from the hard drive needed to satisfy the request.
The system performance of imaging applications is typically very sensitive to the actual pattern of memory references. System performance is degraded when the system has to swap pages of memory to and from secondary storage frequently. If the pattern of memory references for image processing operations extend across page boundaries, the number of costly page swapping increases.
One way to represent an image is as a two dimensional array of pixels in computer memory. Using standard row-major representation, for an image of width W, the C-language expression for evaluating the address of the pixel at (x,y) is: EQU B+y*W+x,
where B is the base address of the image.
Many image processing algorithms refer to pixels clustered in both the x and y directions. As such, the standard row-major addressing scheme can lead to poor locality of reference since it is much more likely that successive rows (i.e. scan lines) are in different virtual memory pages. In order to perform operations on three pixels in a column, for example, the system must access three separate scanlines corresponding to three separate pages in virtual memory. The multiplication involved in the addressing expression can also add significant overhead.
Pixel clusters can be referenced in a straightforward fashion using conventional C index and pointer arithmetic notation. Two-dimensional array notation in C, however, can only be used for arrays with fixed dimensions, e.g. where W is a constant for all such arrays. Thus, two-dimensional array notation is generally impracticable for an application capable of dealing with a range of image sizes and aspect ratios.
The locality of reference problem can be mitigated by "tiling" the image. This maps the image as a sequence of smaller sub-images, or tiles, each of which represents a small rectangle of pixels (usually square, in practice). Tiles are usually arranged in rows across the image. The size of a tile is usually chosen to correspond to the size of a page in the virtual memory management system, but this is not a requirement for tiled image systems. Where the implementation of the computer's virtual memory permits, further efficiencies can be gained by making the tiles a power of two pixels wide, and aligning the "left hand" edge of the image by rounding the image's total width to a power of two. This allows a pixel address to be computed by regarding the linear offset of the pixel as a sequence of catenated bit fields as follows:
______________________________________ Tile Y index Tile X index Pixel Y index within Pixel X index tile within tile ______________________________________
The exact distribution of the bits depends on the power of two used to limit the width of the image, the tile dimensions and the size of a pixel. A typical implementation might use a width of 2.sup.12, or 4096 pixels mapped into 64-pixel square tiles, with each pixel containing 4 bytes. This can be translated into a 32-bit address mapped as follows.
______________________________________ Bits 31-20 Bits 19-14 Bits 13-8 Bits 7-2 Bits 1-0 ______________________________________ Tile Y index Tile X index Pixel Y index Pixel X Offset within tile index within within tile the pixel ______________________________________
If the base address of the image in linear memory is zero, this is the actual pixel address in that the resultant value can be directly dereferenced to give the pixel values. The bit fields within the pixel address do not need to be re-arranged to access a pixel because an address in this format is the actual pixel address in memory. If the base address of the image is not zero, the sequence of bits including the X and Y tile indices and X and Y pixels within a tile only represent part of the actual pixel address, and the base address of the image has to be added to it to compute the actual pixel address in linear memory space.
When the width of the image and the dimensions of a tile are a power of 2, bit masking and shifting can be used to optimize pixel addressing operations such as incrementing or decrementing the pixel address. One example of this form of pixel addressing is described further in Newman, Gary, "Organizing Arrays for Paged Memory Systems" Communications of the ACM, July 1995, Vol. 38, No. 7 ("Newman").
While tiling improves system performance, it complicates the task of computing pixel addresses relative to a more intuitive format where pixels are stored in a two dimensional array. Most image processing applications are written for images in this two dimensional format, and therefore, potentially need to be modified when images are stored in a tiled format. One way to address this issue for tiled images is to write the application so that the code is explicitly aware of the tiled image format. In other words, pixel address computations have to be written specifically for a pixel address in the tiled image format, rather than a more intuitive two-dimensional array format. This approach leads to more efficient code but is costly to implement since tile aware code is complex and more difficult to write from scratch than writing code for an image represented as a standard two-dimensional array of pixels. Several pixel addressing operations (incrementing, decrementing or indexing) must be adapted to the tiled image format so that the application is compatible with tile images. Another possibility, as set forth in Newman's paper, is to create a series of macros for pixel address operations. A macro in this context refers to a shorthand notation for a piece of code that performs some function. Rather than write tile-aware code for each instance of the macro, he or she can simply insert the macro. In the context of image processing applications, an example of a macro would be a snippet of code that performs a pixel address operation (such as incrementing the X coordinate of a pixel in an image) on a pixel address or pointer to a pixel in a tiled image. This simplifies the programmer's task because he or she can substitute the macro for a pixel address operation rather than write an entire image processing routine so that it is expressly adapted for the tiled image format.
Though macros can simplify the task of creating an image processing routine, they can actually degrade performance of the application if not implemented properly. Typical image processing routines include loops in which the same machine instructions are performed over and over. Each time a routine needs to visit a new pixel, the routine needs to increment or decrement the pixel address. If this pixel addressing operation is implemented with the macro, the macro will be executed repeatedly. Thus, if the macro translates into inefficient machine code, it will degrade performance of the application.