Image registration is utilized to transform a subject image so that it is geometrically aligned with a reference image and may generally include three steps: 1) feature matching, 2) transform model estimation, and 3) image resampling and transformation (Wyawahare, M. V., P. M. Patil, and H. K. Abhyankar. 2009. Image registration techniques: an overview. International Journal of Signal Processing, Image Processing, and Pattern Recognition, 2(3): 11-28; Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000.). In some embodiments, feature matching identifies corresponding image coordinate sets between the images that may be used to estimate the transformation model. In some embodiments, transformation model estimation is the process of estimating and possibly fine-tuning the transformation model in order to achieve accurate image co-registration. The derived transformation model may be the best estimate given available calibration information, and each observed control point (e.g., calibration point) is likely to have some level of residual error. Once a final transformation model is attained, the subject image may be transformed and resampled (converting subject image pixel values from the subject image grid to the reference image grid).
Feature-based matching may include feature detection with subsequent matching of detected features. In some embodiments, feature detection is the process of identifying specific image features and characterizing these features using a range of possible descriptors. Feature selection may be based upon the characteristics of regions, edges, contours, line intersections, corners, etc. Feature matching generally utilizes a variety of information to compare image feature characteristics between image sets to identify feature pairs that meet specified matching criteria. Image coordinates from successfully matched feature pairs may be utilized to co-register the images.
For feature-based matching, the spatially invariant feature transform (SIFT) is a descriptor routine that has been widely used. SIFT generates a large number of feature points per image, and uses 128 unique feature descriptors in order to achieve robust matching of individual features between the subject and reference image. Since it was first proposed, variations on the SIFT routine have been published. Other feature-based descriptors include Gaussian derivatives, moment invariants, and shape context. Matching features may be accomplished based on either feature descriptors or spatial relationships. Feature based methods robustly handle images with intensity and geometric distortion differences, but they may yield too few or unevenly distributed matched points.
Area-based matching generally includes the comparison of local windows of image digital number (DN) values. These values could be based upon original image intensity or transformed image products. Area-based matching skips the feature detection step and directly searches for matching characteristics between pixel values of the subject and reference images. Area-based matching methods include: cross-correlation, least squares, mutual information, Fourier, maximum likelihood, statistical divergence, and implicit similarity matching. Area-based methods generally require initial, coarse alignment between images. Area-based methods yield sub-pixel matching accuracy, but may be less effective than feature-based approaches for images with repeating textures, illumination differences, or image distortions. Further, area-based methods also may not be appropriate for images collected from different locations and having wide baselines.
Transformation model estimation may include selecting a transformation model based upon the method of image acquisition, the assumed geometric deformation, and the required accuracy of the registration (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000). Global transformation models (single model applied across entire images) include affine, projective, polynomial-based approaches, each of which is applicable for specific situations (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000). Bivariate polynomial models enable simple rotation, translation, and scaling. Affine models may be appropriate for registration of image scenes acquired from different viewing perspectives, if a perfect (e.g., pin-hole) camera is used, the camera is far from the scene imaged, and the surface imaged is flat. When the camera is close to the scene, then projective models are appropriate in order to handle scale changes from one edge of the scene to the other. For scenes with complex distortions (e.g., high terrain relief viewed from aerial sensors), second or third order polynomial models may be more appropriate (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000). Local transformation models may include piecewise linear and piecewise cubic mapping (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000). Local models may be appropriate when distortions vary over short distances. Local models may require a large number of accurate control points in order to generate local transformations.
Transformation of the subject image to match the positioning and inherit the grid of the reference image may require the subject image to be resampled. In some embodiments, resampling is the digital process of estimating new image pixel values from the original image pixel values when the image grid position or size is changed (Parker, J. A., R. V. Kenyon, and D. E. Troxel. 1983. Comparison of interpolating methods for image resampling. IEEE Transactions on Medical Imaging, MI-2(1): 31-39). Depending upon the interpolation method used, original DN values or modified DN values result. Resampling methods include: nearest neighbor, bilinear interpolation, bicubic functions, etc. (Zitova, B. and J. Flusser. 2003. Image registration methods: a survey. Image and Vision Computing, 21: 977-1000)
Image acquisition procedures that enable precise spatial co-registration between multi-temporal aerial image frames are described in (i) Coulter et al., A Frame Center Matching Technique for Precise Registration of Multitemporal Airborne Frame Imagery, IEEE Transactions on Geoscience and Remote Sensing, Vol. 41, No. 11, pp. 2436-2444, Nov. 2003, and (ii) Stow et al., A frame center matching approach to registration for change detection with fine spatial resolution multi-temporal imagery, Int. J. Remote Sensing, Vol. 24, No. 19, pp. 3873-3879, May 2003. Traditional approaches do not attempt to match image capture stations between collections and do not perform image co-registration between images from the same camera stations first, before other processes such as geo-referencing.
Nadir viewing images are acquired with the camera pointing vertically (e.g., directly below the platform). Oblique images are characterized as images that are purposefully collected with off-nadir viewing angles (e.g., camera is tilted up away from nadir). Obliques are characterized as high oblique (showing the horizon within the photo) and low oblique (not showing the horizon). Oblique images are utilized in Google Maps images (when zoomed in far enough in urban areas) and Bing Maps aerial images, as they enable viewing of the sides of buildings and provide a unique perspective. Oblique images also are useful for such things as earthquake damage assessment, since “pancaking” of multi-level buildings would be apparent in oblique images but might not be apparent in nadir-viewing images. As can be seen from Google Maps or Bing Maps, oblique viewing images provide information and detail that is not available from nadir viewing images (building height, building condition, building use, etc.).
Currently, oblique images are collected and utilized, but are not directly compared for change detection. Instead, orthorectified, nadir viewing imagery is used for change detection, after which corresponding oblique images that were acquired in an ad hoc manner are found (e.g., without attempting to match image stations) before and after earth surface changes have occurred to visualize change features.