The present invention relates to the field of image processing and, in particular, it concerns methods for locating in one perspective view a point of interest designated in another perspective view where the viewing directions of the two views are significantly non-parallel.
A common problem in the field of multi-platform field operations is communication of a point of interest between different platforms. Specifically, it is often desired to designate a particular location (e.g., pixel) within an image of a scene viewed from a first direction and then to determine the corresponding location (pixel) of the same point of interest in an image of the scene as viewed from a different direction. When the viewing directions are near-parallel, or where the scene is generally flat so that few features are obscured, this task can be achieved by well developed image processing techniques for image correlation which identify pairs of corresponding features in the two images and determine a transformation between the images which maps pixel-to-pixel between the images. For scenes with pronounced vertical topology, such as urban terrain, and where viewing directions are significantly non-parallel, the problem rapidly becomes much more difficult, if not insoluble.
The source of the aforementioned difficulty is intuitively understood if one considers two aerial cameras viewing a group of buildings from the West and from the East, respectively. Clearly, all features on the vertical faces of the buildings are not shared between the two images and cannot form a basis for conventional image correlation. Furthermore, if the buildings are densely positioned and the viewing angles shallow, much of the ground area between the buildings may be obscured in one or both images. As a result, the vast majority of the features in each image may not have corresponding features in the other image, leaving relatively few features, typically including parts of the building roofs, in common. As the proportion of corresponding features in the two images decreases, the reliability and accuracy of conventional image correlation techniques rapidly declines.
An alternative approach to transferring a point of interest between two perspective views is by registration of each image to a geographical image database. Each image is separately correlated to the corresponding subregion of an image retrieved from the geographical image database, and hence each pixel of each image is associated with a known geographical coordinate. The geographic coordinates then provide a common language to allow transfer of points of interest from one image to the other.
The use of a geographical reference image (“orthophoto” or satellite image) has certain advantages. Firstly, the use of an overhead view tends to limit the maximum angular discrepancy between viewing directions of the images to be correlated. Secondly, the digital terrain map (DTM) associated with the geographical database provides additional information which can be used to facilitate the correlation processing. Nevertheless, this approach also encounters major problems when dealing with urban terrain where the DTM typically lacks sufficient resolution and/or may not be sufficiently up-to-date to define building structures and the aforementioned problems of insufficient correspondence between features tend to occur.
For the above reasons, the generally accepted approach to field operations in urban terrain is that a full three-dimensional model of the external surfaces of the building structures should be determined. This may be achieved using structure-from-motion (“SFM”) techniques in which features are tracked through successive frames of a video to derive locations of the features in three-dimensions, and the tracked features are then associated to identify surfaces of the structures. This model can then be used to provide additional information for any given viewing direction, thereby facilitating registration of each subsequent perspective view with the model and the transfer of points of interest to or from each image.
Although the three-dimensional model approach is highly effective, it is not always feasible in practical operations. Specifically, construction of a three-dimensional model requires acquisition of images from all directions around the buildings in question as well as computationally intensive processing. Limitations of time and/or accessibility in hostile territory may preclude this approach.
A further technical hurdle presented by the three-dimensional model approach is alignment of the model with the real-world geographic coordinate system. The geometrical form of the three-dimensional model, typically represented as a precise local elevation map, is a fundamentally different and incompatible form of data from the image data of an orthophoto or satellite image tied to the geographical coordinate system. In many cases, this alignment can only be performed reliably as a laborious manual procedure, rendering the technique impractical for automated or real-time operations.
There is therefore a need for methods for locating in one perspective view a point of interest designated in another perspective view where the viewing directions of the two views are significantly non-parallel, and particularly in urban terrain where full three-dimensional model data is unavailable.