The use of computer graphics has grown dramatically in recent years, with numerous military, industrial, medical, commercial and consumer applications. Some such applications include computer image enhancements, flight trainers and simulators, medical imaging (e.g., CAT scanners), commercial video processing systems, video games, home computers, and many more. Image transformations redefine the spatial relationships between picture elements (pixels) in an image. Many of these systems make use of a class of image transformations called "affine" image transformations. An affine transformation (hereinafter referred to interchangeably as "transformation", and "affine image transformation") is any transformation which preserves the parallelism of lines in the input and output images. Such transformations include the operations of scaling in either or both dimensions, translation (moving), or rotation. These operations may be performed on an entire image or on any part thereof.
Image transformation is a growing branch of image processing and has also been called "Image Warping". Geometric transformation of images is of great practical importance in remote sensing (distortion correction), medical imaging (image registration), computer vision (image processing), and computer graphics, where the primary application of transformations is in texture mapping (mapping of "texture" patterns onto displayed objects), and in the creation of interesting visual effects that have always attracted the entertainment industry.
Such transformations typically operate on an image in an image space which is composed of a rectilinear array of pixels. A pixel typically has a location and an intensity value associated with it. The location component of a pixel is a pair of coordinates identifying its location within an image space. The intensity value may be either a scalar quantity (e.g., for monochrome image which only require a measure of lightness/darkness), a vector quantity (e.g., for expression of a color intensity value as a vector whose components are HUE, SATURATION, and LIGHTNESS), or as a multiple value (e.g., for storage of color values as separate RED, GREEN, and BLUE intensity values). It is generally assumed that these pixels either have fixed dimensions or that they represent an area having fixed dimensions.
Pixels are generally associated with a signal of some type, because as elements of an image, they are intended to be displayed at one point or another. A pixel may be one signal in a serial stream of signals composing a video signal, or a stored value in an image memory (RAM, ROM, disk file, serial memory, or any other suitable storage medium) or used to hold a digital representation of an image for a display system. Typically, a pixel's location (coordinates) in an image of which it is a part is implicit in its position in image memory. A stored representation of an image has an organization associated with it that dictates how and where in image memory a pixel is stored. In order to read the intensity value for a pixel at a specific location within a stored image, it is necessary to access the location within the image storage medium which represents that specific location within the image.
Pixel locations in a serial video signal (stream) are typically encoded as a time offset from a synchronizing reference pulse in the stream, or from a synchronizing reference pulse on a separate line. Each pixel in an image has an allocated time slot within a video signal. At the time allocated to a particular pixel, the video signal is made equal to a signal level representing that particular pixel's intensity value. A particular pixel within a video signal may be accessed by measuring the video signal value (or intensity and chroma values, in the case of color video) at the time slot allocated to that particular pixel.
Herinafter, all discussions of "pixels" will assume that a pixel is associated with a signal which may used to display a piece of an image on a display system, that pixels have a height and a width, a location in an image expressed as rectangular coordinates, and that pixels have an intensity value associated with them.
Image transformations can be simple translations (moving an image from one point on a display to another) or complicated warping. Many present research efforts are aimed at increasing the speed of transformations such as these.
A class of transformation technique which has grown in popularity due to its speed and realizability is the scanline method, wherein an image is processed on-the-fly as it is scanned for display on a raster scan device such as a CRT (cathode ray tube) display. Such methods find practical realizations in real-time hardware for video effects, texture mapping, geometric correction, and interactive image manipulation. Even the relatively straightforward scanline methodology often requires elaborate high-speed computing hardware including digital multipliers, and large amounts of temporary image storage. There is always a desire to accomplish the same ends in a simpler, less expensive manner.
Geometric transformation map one coordinate system onto another, based upon spatial transformation mapping functions. Sometimes transformation functions can be expressed as a set of simple analytic expressions. Other times it is necessary to use more elaborate forms of description (e.g., a sparse lattice of control points). The general mapping function can be given in two forms: either relating a point in an output coordinate system (in x and y, for example) to a corresponding point in an input coordinate system (in u and v, for example), or vice versa, as shown in the following equations (1) and (2), respectively: EQU (x,y)=[X(u,v),Y(u,v)] (1)
or EQU (u,v)=[U(x,y),V(x,y)] (2)
In these equations, functions X and Y map an input pixel (u, v) to its corresponding output pixel (x, y) while functions U and V map the output pixel (x, y) back to its corresponding input pixel (u, v).
FIG. 1a shows an example of a translation transformation, whereby a rectangular block of pixels 112 within an input image 110 is operated upon by a translation transformation such that without altering its orientation, a corresponding rectangular block of pixels 116 appears in a different location in an output image 114.
FIG. 1b gives an example of image rotation, whereby an input block of pixels 122 within an input image 120 is operated upon by a rotation transformation producing a rotation of the corresponding rectangular block of pixels 126 in an output image 124.
FIG. 1c demonstrates scaling (contraction and expansion, whereby an input rectangular block of pixels 132 in an input image 130 is squeezed in the horizontal direction while being stretched in the vertical direction by scaling the distances between input pixels in a corresponding rectangular block of pixels 136 in output image 134 by one factor in the horizontal dimension and by another factor in the vertical dimension.
FIG. 1d demonstrates a "shear" transformation, whereby an input rectangular block of pixels 142 in input image 140 is operated upon by a shear transformation to produce a corresponding parallelogram-shaped block of pixels 146 in output image 144. Assuming an input image in UV coordinate space with each point represented by a "U" and "V" coordinate, (u, v), and an output image in XY coordinate space with each point represented by an "X" and a "Y" coordinate, (x, y), these transformations are defined mathematically as follows:
Translation of (u, v) by an amount (T.sub.U, T.sub.V) EQU x=u+T.sub.U EQU y=v+T.sub.V ( 3) PA1 Scaling of (u, v) by an amount (S.sub.U, S.sub.V) EQU u=u.times.S.sub.U EQU v=v.times.S.sub.V ( 4) PA1 Rotation of (u, v) by an angle .theta. EQU x=u.times.cos(.theta.)+v.times.sin(.theta.) EQU y=v.times.cos(.theta.)-u.times.sin(.theta.) (5) PA1 Shearing of (u, v) along the horizontal by a shear factor a EQU x=u+a.times.v EQU y=v (6) PA1 Shearing of (u, v) along the vertical by a shear factor b Well known to those skilled in the art of graphic image EQU x=u EQU y=v+b.times.u (7) PA1 it is a further object of the present invention to provide a technique for transforming input images in an input image space into output images in an output image space without iterative multiplications; PA1 it is a further object of the present invention to provide a technique for transforming input images in an input image space into output images in an output image space without storage of a complete intermediate image; PA1 it is a further object of the present invention to provide a technique for transforming input images in an input image space into output images in an output image space such that image collapse (loss of information preventing restoration of the image) does not occur for angles of rotation approaching odd multiples of 90 degrees.
processing is a technique by which affine transformations may be expressed as a matrix operation whereby the location of each point, input and output, is expressed as a 3 wide by 1 high vector quantity. Every output point location is equal to the vector product of its corresponding input point's vector quantity and a 3 by 3 transformation matrix. This operation is expressed as follows: ##EQU1##
In the matrix shown in equation (8), the .epsilon. and .zeta. matrix coefficients control translation in the horizontal and vertical directions, respectively; the .alpha..beta., .tau., and .delta. matrix coefficients control scaling, rotation, and shearing.
Also well known to those skilled in the art is the method by which the affine transformation parameters of equations (3) through (7) may be combined to form constant matrix coefficients. Cascaded transformation matrix operations may be compressed into a single transformation matrix which also has constant matrix coefficients.
The foregoing discussion of matrix coefficients assumes that matrix algebra is used to determine the numbers of interest. In fact, three by three transformation matrices have become a sort of standard form of notation for expressing 2-D image transformations. However, matrix algebra is merely a convenient shorthand technique for performing what would otherwise be a tedious series of algebraic manipulations. Of course, any equivalent ordinary algebraic technique could be substituted for the matrix algebra.
Traditionally, there have been two approaches to transformations: forward mapping, where the input is mapped to the output (as in functions X and Y in (1) ), and inverse mapping, where the output is mapped to the input (as in U and V of (2) ). Equations (3)-(8) are representative of affine transformation from a forward-mapped perspective, whereby output pixel locations are expressed as a function of input pixel locations.
Forward mapping interpolates each input pixel into the output image at positions determined by the mapping functions. This mapping can generate "holes" and "overlaps" in the output image when mapped pixels either incompletely fill or over-fill the corresponding output image. This is due to alterations in the relative position of adjacent pixels as a result of the transformation process which can leave gaps or overlapping areas. Such artifacts have been overcome by using methods such as the four-corner mapping paradigm as disclosed in: Wolberg, G., "Digital Image Warping", IEEE Computer Press Monograph, IEEE Catalog Number EH0322-8, 1990. In this method, the intensity of the output pixel is the sum of all of the input intensities scaled by the amount of the output area covered by each input pixel. The calculation of output intensities is typically performed after the completion of mapping. An accumulator array is required to properly integrate all the contributing input intensities. Typically, there is one accumulator per pixel of the output image, effectively requiring another complete frame buffer. Similar schemes for determining output pixel intensities are used in many transformation methods of the prior art. "Intersection tests" for area coverage, filtering to handle magnification, and the requirement for an accumulator array are the major drawbacks of this approach. Forward mapping is generally most useful when the input image must be read sequentially, or when it does not reside in memory. This is because in forward mapping, the output placement of pixels is described in terms of the input placement of pixels, allowing the input image to be processed as it is received, in any necessary order, such as in a raster scan fashion. Forward mapping requires that there be random access to the output image.
Inverse mapping works on the discrete output domain, projecting each output pixel onto quadrilaterals in the input domain. In this method, an accumulator array is not required, and all of the output pixels are computed. But, there is the possibility of skipping pixels when sampling the input, thus requiring filtering in the output domain. Clipping is natural to inverse mapping, unlike forward mapping. Inverse mapping is used where the input image is stored in memory and can be accessed in random order as required by the order of output pixel processing.
The 2-dimensional (or 2-D) nature of forward and inverse mapping complicates some computations (filtering, sampling, and reconstruction, for example). Fortunately, some transformations are separable, that is, the computations can be performed in one dimension at a time. Separable mapping exploits the characteristics of certain transformations, and decomposes the forward mapping function into a series of orthogonal 1-D (one dimensional) transformations, thus allowing the use of simple digital filtering and reconstruction. The execution to one-dimensional transformations often is accomplished using "scale" and "shear" operations. "Scaling" in one dimension, simply states that one dimension of the output image space translates to a fixed multiple of one dimension of the input image space. "Shearing" in one dimension involves skewing an image along one axis such that a rectangular-shaped array of pixels in the input image space becomes a parallelogram-shaped array of pixels in the output image space. Some 1-D transforms combine scaling and shearing into a single operation.
One advantage of 2-D image transformations which are separated into two 1-D image transformation is that the image can be read in row/column order (e.g., in "scanline" fashion), providing efficient data access and substantial savings in I/O time over techniques which require an image to be loaded in memory before processing. Another advantage of this approach is that it can be implemented with great ease in a pipeline structure, facilitating the its implementation in real-time video processing hardware.
Many multi-pass scanline methods operating in a sequential row/column fashion are separable into 1-D components. Examples of such methods may be found in: Weizman, Carl F. R., "Continuous Anti-aliased Rotation and Zoom of Raster Images", Computer Graphics, (SIGGRAPH '80 Proceedings), vol. 14, no. 3, pp. 286-293, July, 1980, which describes a four-pass scale/shear method; Catmull, E. and A. R. Smith, "3-D Transformation of Images in Scanline Order," Computer Graphics, (SIGGRAPH '80 Proceedings), vol. 14, no. 3, pp. 279-285, July, 1980, hereinafter referred to as Catmull-Smith, which describes a two-pass scale/shear method, hereinafter referred to as the Catmull-Smith method; Paeth, Alan W., "A Fast Algorithm for General Raster Rotation," Graphics Interface '86, pp. 77-81, May 1986, which describes a three-pass shear method; and Tanaka, A., M. Kameyama, S. Kazama, and O. Watanabe, "A Rotation Method for Raster Images Using Skew Transformation", Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 272-277, June, 1986, which also describes a three-pass shear method. The most general of these of these is the Catmull-Smith method, which takes a two-pass approach.
The Catmull-Smith method performs a variety of image transformations on digital images in two passes. Catmull-Smith discusses texture mapping onto 2-D representations (projections) of 3-D surfaces using this approach. In this technique, a 2-D texture surface is transformed as a 2-D image until it conforms to a projection of a polygon placed arbitrarily in 3-D space. This 2-D transformation is decomposed into two simple orthogonal 1-D transformations, one applied to the input image on a row basis in a first pass called the "h-pass", and the second applied to the input image on a column basis in a second pass called the "v-pass". FIG. 2 illustrates this concept of decomposition of a two dimensional transformation into two 1 dimensional transformations and contrasts it to a single two dimensional transformation.
With respect to FIG. 2, a representative point "(u, v)" 205 in an input coordinate space "UV" 210, is to be operated upon by a transformation process to produce a corresponding representative point "(x, y)" 215 in an output coordinate space "XY" 220. A reference two dimensional transformation as described in equation (1) and indicated in FIG. 2 as "X(u,v)" and "Y(u,v)" (230), representing a forward-mapping function, produces point (x, y) 215 directly, but with considerable computational complexity, as described previously. The two dimensional transformation 230 may be decomposed into two 1-D transformations F.sub.v (u) (250) and G.sub.x (v) (260), which are applied sequentially. In between the two operations, a representative intermediate point "(x, v)" (240) is produced in an intermediate "XV" coordinate space 245.
In the first pass, or h-pass, one part of the forward mapping function 230, X(u,v) is applied to each row of the input image, maintaining the "v" coordinate as a constant for each row. The result of this pass is an intermediate image that has the same x coordinates as the final output image; only the y coordinates have been computed. This one-dimensional application of X(u,v) can be written as F.sub.V (u) (250), as follows: EQU (x,v)=(F.sub.V (u),v)=(X(u,v),v) (9)
This maps every point (u, v) 205 of an input image in UV space 210 onto a point (x, v) 240 in an intermediate XV coordinate space 245.
Of course, it is possible to perform the first pass on the other axis first, in which case, a function G.sub.U (v), equivalent to Y(u,v) with "u" held constant would be used producing an intermediate point (u,y). In a separable mapping of this type, the order of 1-D processing is arbitrary.
In a second pass, or v-pass, the other part of the two dimensional transformation 230, Y(u, v) is applied to the intermediate image produced in the h-pass. The fact that this intermediate image is in XV coordinate space (245) complicates the v-pass somewhat. The function Y(u, v) is designed to produce Y coordinates of points in XY (220) space from corresponding points in UV space (210). But since the intermediate image is in XV space (245), as is representative point (x,v) 240, an auxiliary function is required. This function, H.sub.X (v) is an expression of a "u" coordinate as a function of a "v" coordinate, while keeping an "x" coordinate constant, that is: u=H.sub.X (v). This auxiliary function is determined by solving the equation: EQU X(u,v)-x=0 where x is constant (10)
for "u" over all "v", holding x constant
The second pass, or v-pass, may now be expressed as the application of the following equation to the intermediate image: EQU (x,y)=(x,G.sub.X (v))=(x,Y[H.sub.X (v),v]) (11)
on a column by column basis, holding x constant for each column. This pass maps every point (x, v), (illustrated by 240 in FIG. 2) in the intermediate image in intermediate coordinate space 245 into a point (x, y) (illustrated by 215 in FIG. 2) in the output image in output coordinate space 220.
There are, unfortunately, a few problems inherent with the Catmull-Smith method. One such problem is that after the h-pass, it is possible for the input image to collapse into a line in the intermediate image for some transformations. An example of such a transformation is a 90 degree rotation, wherein every row of points in the input image will collapse to a single point in the intermediate image. The resulting finished intermediate image will comprise only a straight line. The loss of information in this case is great enough that the subsequent v-pass is unable to complete the transformation and is reduced to a meaningless computation. This problem is generally known as the "bottleneck" problem. Catmull-Smith suggests a solution based upon a change of input coordinates, the difficulties of which are discussed in Wolberg, G. and Terrance E. Boult, "Separable Image Warping with Spatial Lookup Tables," Computer Graphics, (SIGGRAPH '89 Proceedings), vol. 23, no. 3, pp. 369-378, July, 1989.
Another problem inherent in the Catmull-Smith method, and with most other 2-pass methods, is the need for an intermediate image buffer, which can be bulky and expensive, especially in real-time video processing systems.
A third problem with the Catmull-Smith method is that the need for an auxiliary function H.sub.x (v) represents a significant computational complexity.
Other problems of the Catmull-Smith method include: the requirement for a significant number of multiplications. Multiplication generally require a great deal more time or a great deal more hardware than additions. If a method were available which used only additions and a minimal amount of intermediate image storage, there would be a very significant savings in time and/or hardware complexity. Of course, lower hardware complexity generally implies lower cost.