Image registration can be utilized to transform a subject image so that it is geometrically aligned with a reference image and may generally include three steps: 1) feature matching, 2) transform model estimation, and 3) image resampling and transformation (Wyawahare, M. V., P. M. Patil, and H. K. Abhyankar 2009 Image registration techniques: an overview. International Journal of Signal Processing, Image Processing, and Pattern Recognition 2(3): 11-28; Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000.). Feature matching can identify corresponding image coordinate sets between the images that may be used to estimate the transformation model. Transformation model estimation can be the process of estimating and possibly fine-tuning the transformation model in order to achieve accurate image co-registration. The derived transformation model may be the best estimate given available calibration information, and each observed control point (e.g., calibration point) is likely to have some level of residual error. Once a final transformation model is attained, the subject image may be transformed and resampled (converting subject image pixel values from the subject image grid to the reference image grid).
Feature-based matching may include feature detection with subsequent matching of detected features. Feature detection may be a process of identifying specific image features and characterizing these features using a range of possible descriptors. Feature selection may be based upon the characteristics of regions, edges, contours, line intersections, corners, etc. Feature matching generally utilizes a variety of information to compare image feature characteristics between image sets to identify feature pairs that meet specified matching criteria. Image coordinates from successfully matched feature pairs may be utilized to co-register the images.
For feature-based matching, the spatially invariant feature transform (SIFT) is a descriptor routine that has been widely used. SIFT generates a large number of feature points per image, and uses 128 unique feature descriptors in order to achieve robust matching of individual features between the subject and reference image (Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91-110). Since it was first proposed, variations on the SIFT routine have been published. Other feature-based descriptors include Gaussian derivatives, moment invariants, and shape context. Matching features may be accomplished based on either feature descriptors or spatial relationships. Feature-based methods can handle images with intensity and geometric distortion differences, but they may yield too few or unevenly distributed matched points.
Area-based matching generally includes the comparison of local windows of image digital number (DN) values. These values could be based upon original image intensity or transformed image products. Area-based matching skips the feature detection step and directly searches for matching characteristics between pixel values of the subject and reference images. Area-based matching methods include: cross-correlation, least squares, mutual information, Fourier, maximum likelihood, statistical divergence, and implicit similarity matching. Area-based methods generally require initial, coarse alignment between images. Area-based methods yield sub-pixel matching accuracy, but may be less effective than feature-based approaches for images with repeating textures, illumination differences, or image distortions. Further, area-based methods also may not be appropriate for images collected from different locations and having wide baselines.
Transformation model estimation may include selecting a transformation model based upon the method of image acquisition, the assumed geometric deformation, and the required accuracy of the registration (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000). Global transformation models (single model applied across entire images) include affine, projective, polynomial-based approaches, each of which is applicable for specific situations (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000). Bivariate polynomial models enable simple rotation, translation, and scaling. Affine models may be appropriate for registration of image scenes acquired from different viewing perspectives, for example, if a perfect (e.g., pin-hole) camera is used, the camera is far from the scene imaged, and the surface imaged is flat. When the camera is close to the scene, then projective models are appropriate in order to handle scale changes from one edge of the scene to the other. For scenes with complex distortions (e.g., high terrain relief viewed from aerial sensors), second or third order polynomial models may be more appropriate (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000). Local transformation models may include piecewise linear and piecewise cubic mapping (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000). Local models may be appropriate when distortions vary over short distances. Local models may require a large number of accurate control points in order to generate local transformations.
Transformation of the subject image to match the positioning and inherit the grid of the reference image may require the subject image to be resampled. Resampling can be the digital process of estimating new image pixel values from the original image pixel values when the image grid position or size is changed (Parker, J. A., R. V. Kenyon, and D. E. Troxel. 1983. Comparison of interpolating methods for image resampling. IEEE Transactions on Medical Imaging, MI-2(1): 31-39). Depending upon the interpolation method used, original DN values or modified DN values may result. Resampling methods include: nearest neighbor, bilinear interpolation, bicubic functions, etc. (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000)
Image acquisition procedures that can enable precise spatial co-registration between multi-temporal aerial image frames are described in (i) Coulter et al., A Frame Center Matching Technique for Precise Registration of Multi-temporal Airborne Frame Imagery, IEEE Transactions on Geoscience and Remote Sensing, Vol. 41, No. 11, pp. 2436-2444, November 2003, and (ii) Stow et al., A frame center matching approach to registration for change detection with fine spatial resolution multi-temporal imagery, Int. J. Remote Sensing, Vol. 24, No. 19, pp. 3873-3879, May 2003. Traditional approaches generally do not attempt to match sensor station positions between collections and do not perform image co-registration between images from the same sensor stations first, before other processes such as geo-referencing or orthorectification.
Nadir viewing images can be acquired with the sensor pointing vertically (e.g., directly below the platform). Oblique images are characterized as images that are purposefully collected with off-nadir viewing angles (e.g., sensor is tilted up away from nadir). Obliques are characterized as high oblique (showing the horizon within the photo) and low oblique (not showing the horizon). Oblique images are utilized in Google Maps images (when zoomed in far enough in urban areas) and Bing Maps aerial images, as they enable viewing of the sides of buildings and provide a unique perspective. Oblique images also are useful for such things as earthquake damage assessment, since “pancaking” of multi-level buildings would be apparent in oblique images but might not be apparent in nadir-viewing images. As can be seen from Google Maps or Bing Maps, oblique viewing images provide information and detail that is not available from nadir viewing images (building height, building condition, building use, etc.).
Traditional airborne video systems typically collect airborne video at high imaging frame rates (e.g., 1-30 frames per second) continuously from a stationary or moving platform. Full motion video (FMV) systems are characterized as having limited image extent, and provide what is referred to as the “soda-straw” effect, where detailed video images are obtained using a very limited field of view of a very limited ground extent. New sensors such as Siena Nev. Corporation's Gorgon Stare and BAE System's ARGUS have been created in recent years to provide wide area motion imagery (WAMI). As with FMV, WAMI sensors collect video imagery at high frame rates (e.g., 1-30 frames per second) continuously from a stationary or moving platform. However, WAMI sensors combine several imaging sensors into a single large (e.g., 1.8 gigapixel in the case of ARGUS) video image. WAMI sensors may image large areas (e.g., 36 sq km) at high frame rates and with high spatial resolution (e.g., 3-inch), but these are expensive systems and may be limited in terms of the number available.