Scenes and objects in the physical world may be perceived as patterns of light intensifies varying in time and space. In most instances, the apparent motion of the brightness patterns, "the optical flow," is well correlated with the movement or "motion field" of the illuminated scene and objects, and therefore an analysis of the optical flow can be used to deduce the motion field.
Instantaneous samples of light intensity distributions can be captured by a camera on a film as a sequence of two-dimensional images. The images can be digitized by an analog-to-digital converter. In digital form, the patterns of light intensities are represented by pixels. The data values of the pixels represent the relative intensity of the light at given points in space and time. The problem of extracting motion information from a sequence of images is exceedingly complex. In the prior art, the quality of the extracted motion information competes heavily with the costs incurred to attain it.
A fundamental problem in image analysis is to recover the motion field from sequences of two-dimensional images. This motion extraction problem is alternatively called motion estimation, multiple view analysis, or image registration. Simply stated, image registration determines, for a given sequence of images, a representation of motion that best aligns pixels in one image with those in a subsequent image. The extracted motion information can be valuable in a wide variety of applications, such as motion compensated image compression, image compositing, pattern recognition, multi-frame stereo correspondence determination, image rectification, robot navigation, structure from motion extraction, feature tracking, and computer vision. For example, in image compression, storage requirements are greatly reduced when motion information is applied to a base image to synthetically predict subsequent images.
Images do not yield motion information readily. Many techniques use numerous tedious, and often inefficient steps. Not surprisingly, low-cost, gross examination of the images tends to provide low quality motion information. Conversely, higher quality motion information can only be extracted by examining the myriads of pixels in minute detail, usually at a substantially higher cost.
Economical solutions may be suitable for those applications where the quality of the motion information is of secondary importance. However, for applications such as medical imaging where the quality of the image information can not be compromised, the costlier solutions are the inevitable choices.
At a low-end of the quality/cost spectrum are global image registration techniques. In global technique, successive images are simply superimposed. The superimposed images are then displaced from each other in various directions, by trial-and-error, until the average light intensity difference between the images is minimized. The relative displacement can be convened to motion information. The motion field can be expressed parametrically as, for example, affine flow fields, pointing in the general direction of the inferred motion. For example, the general formulation: ##EQU1## where x' and y' are the displaced coordinates of the pixels at x and y, and m.sub.0 to m.sub.5 are the motion parameters, can be used for simple transformations such as rigid, rotational, and scaled transformations.
Global image registration may be adequate for planar motion over a small direction. However, for more complex motions, a single motion descriptor for the entire image would clearly be erroneous for all but a small portion of the image. In addition, it is difficult to extract motion information with global estimating techniques if the speed of the optical flow is high, that is, the relative displacement of the pixels in successive images is large.
In one variation of global image registration, the image is partitioned into a number of smaller blocks. Each block is then individually aligned with a corresponding block of the next image to produce a motion field for each of the blocks. This technique is used for compressing video signals according to the Motion Picture Experts Group (MPEG) standard. Block-based image registration can be done at a reasonable cost without substantially degrading the quality of the images. However, block-based image registration may exhibit annoying visual artifacts due to motion field differences at abutting block boundaries.
At the high-cost end there are local image registration techniques. High-cost, pixel-based image registration can operate on either the raw, or alternatively, interpolated values of the individual pixels. In the first alternative, the discrete integral pixel values of the first and second image are compared over some small correlation window. This technique, using interpolated real number values, can also measure the rate of change of intensity values to converge on a motion estimate more rapidly. Obviously, examining the intensity relationships of individual pixels is time-consuming.
Understandably, there is a need for an image registration system and method which permits the blending of quality and cost factors. The registration method and system should enable the extraction of motion information with a quality comparable to that of local registration techniques, but at costs which are not excessively greater than those of global image registration. Furthermore, the system and method should be adaptable to a wide range of imaging analysis applications. In addition, it is desirable that the image registration techniques can be practiced without having special knowledge of the represented scenes or camera equipment used to capture the scenes.