It is sometimes advantageous to image an area of interest using multiple cameras or sensors with different imaging characteristics, such as in surveillance or reconnaissance applications. For example, the sensors may be arranged at separate locations with different orientations, may have different field of views or different optical resolutions, and/or may operate at different spectral domains. The image data associated with each individual sensor is thereby augmented, and can serve to compensate for the respective shortcomings of the other sensors. The aggregated image data may be processed to generate a unified image that can then be displayed. Alternatively, different images associated with different sensors may be displayed separately, such as at display devices situated at different locations. Each of the operators would consequently view different versions of the same area of interest, such as at slightly different orientations and/or focal lengths, which may result in slightly varying image features at each displayed image.
Two remote parties viewing different images that portray a mutual area of interest may seek to communicate information about the area in terms of their respective image. For example, one party may wish to convey to the other party information with respect to a point of interest as it appears on his image. Since each party is viewing a different image, the transmittal of an entire image (or sufficient image data to enable accurate reconstruction of the image) would require a large bandwidth data link and consume substantial time and resources (in terms of both computational cost and transmission overhead), which may be unavailable and/or undesirable. The use of location or orientation determining systems associated with each sensor, such as a global positioning system (GPS) or inertial navigation system (INS), can help simplify coordination between sensor data, but would also add significant processing time as well as the increased weight and cost of the additional infrastructure.
Various techniques for image registration, i.e., determining an optimal transformation between different images of a common scene, are known in the art. One approach involves selecting a reference point on one image and then identifying the location of the reference point in the other image. If both reference points lie on the same image plane, then a straightforward linear transformation model can be established between the images, allowing for the conversion of other non-reference points, using interpolation if necessary. However, if there are varying depths between the terrain in each of the images, and thus the reference points reside on different image planes, it would preclude the application of a direct transformation model between the two images, which would result in an imprecise registration and substantially increase the margin of error. In this case, it would be necessary to know the relative locations and the direction or viewing angle of each image sensor (and perhaps additional imaging characteristics as well), in order to perform accurate image registration between their respective images.
U.S. Pat. No. 7,925,117 to Hamza et al, entitled “Fusion of Sensor Data to Form an Integrated Image”, is directed to a system and method for forming a combined sensor and synthetic image that provides guidance to vehicle operators in limited or no visibility conditions. An image registration process is used to fuse the images. At least two landmarks are identified, an image gradient is extracted from a sensor image dataset for each of the landmarks, and a corresponding image gradient is extracted from a synthetic image dataset for each of the landmarks. A center of mass is calculated for each of the image gradients extracted from the sensor and synthetic image datasets. The displacement is calculated between corresponding image gradients from the sensor and synthetic image datasets centered at the calculated centers of mass. The images are stabilized by minimizing the displacement to form the integrated image.
U.S. Pat. No. 7,957,584 to Nafaji et al., entitled “Fast Object Detection for Augmented Reality Systems”, is directed to a method for real-time pose estimation of an object in a sample view. A set of stable feature regions of the object are selected in an off-line environment. Multiple view descriptors of a view set for each selected feature region are incorporated into a statistical model, in an off-line environment. A search area of the statistical model is constrained using geometric consistencies between the statistical model and the sample view. The constrained search area is searched to match regions in the statistical model with regions in the sample view.
U.S. Pat. No. 8,036,678 to Goldenberg et al., entitled “Real-Time Geographic Information System and Method”, is directed to a system and method for dynamic distribution of location-related information between users with different perspective views of a common region of interest. A shared location reference having a defined coordinate system is provided for the region of interest. The location reference may include at least one reference image chosen from an aerial image, a satellite image, and an orthophoto, and may also include an elevation map or a digital surface model. The shared location reference may be stored at a remote database. A mapping is derived between the current perspective view of a first user and the location reference. A point-of-interest is designated within the first user's current perspective view, and the corresponding coordinates of the point-of-interest in the shared location reference is derived using the first user's mapping. The location of the coordinates within a second user's perspective view is derived using the second user's mapping, and the point-of-interest is displayed in the context of the second user's perspective view.
U.S. Pat. No. 8,260,036 to Hamza et al., entitled “Object Detection Using Cooperative Sensors and Video Triangulation”, is directed to a method and apparatus for detecting and tracking a target object, particularly for the purpose of docking or target avoidance. Images of a field of view are captured by at least two cameras mounted on one or more moving platforms at different perspectives. The images are analyzed to identify landmarks which can be used to track the targets position from frame to frame. The images are fused with information about the target and/or the platform position from at least one sensor. The fused information is processed to triangulate the position of the target and track its position relative to the moving platform, or the position of the platforms with respect to the location of the target, either one of which is displayed.
Bai, Yang, “Feature-based Image Comparison and Its Application in Wireless Visual Sensor Networks”, PhD diss., University of Tennessee, 2011, discusses the feature-based image comparison method, which compares different images and aims to find similar image pairs using a set of local features from each image. The image feature is a numerical representation of the raw image, which can be more compact in data volume. A pair of corner detectors is proposed for the step of feature detection. The first detector is based on the Discrete Wavelet Transform that provides multi-scale corner point detection and the scale selection is achieved through a Gaussian convolution approach. The second detector is based on a linear un-mixing model, which treats a corner point as the intersection of two or three “line” bases in a 3×3 region. The line bases are extracted through a constrained Nonnegative Matrix Factorization (NMF) approach and the corner detection is accomplished through counting the number of contributing bases in the linear mixture. An effective dimensionality reduction algorithm for the high dimensional Scale Invariant Feature Transform (SIFT) descriptors is proposed for the step of descriptor calculation. A set of 40 SIFT descriptor bases are extracted through constrained NMF from a large training set and all SIFT descriptors are then projected onto the space spanned by these bases, achieving dimensionality reduction.