(1) Field of the Invention
The present invention pertains to a method of geocoding a perspective image comprised of a two dimensional array of picture elements or pixels, so that each picture element or pixel of the perspective image has associated with it a unique, three dimensional geographic location in the scene depicted by the perspective image. The three dimensional location at each pixel is obtained using a corresponding database stored reference digital elevation model, or a three dimensional surface model of the location. The reference elevation model or surface model may be composed of rows and columns of height values corresponding to two dimensions of location values in the scene, or it may be composed of a three dimensional surface or volume model of the scene area, such as a computer-aided design model or any other means of describing the three dimensional characteristics of a scene area.
(2) Description of Related Art
Modern military aircraft require a capability to target, from the aircraft cockpit, precision-guided weapons. Many of these weapons require precise, accurate knowledge of the geographic location to which they are to be directed, including three dimensions of information such as latitude, longitude and altitude of the target point. One method to do this is to use sensor images obtained by sensors carried on the aircraft. Such sensors produce a perspective view of the target and the scene area around the target. Typical sensors may also produce a set of geographic coordinates for some marked location in the perspective image, such as the location indicated by a cross-hair or cursor mark in the image. However, making an on-board sensor highly accurate, so that targets can be located within a ground coordinate system with sufficient accuracy using the image produced by the sensor, is difficult and expensive. These problems can be overcome by providing an accurate geocoding for the perspective image produced by the sensor. This geocoding gives each pixel in the sensor image an accurate association with the three dimensional geographic coordinates of the point in the scene depicted in that pixel. When the geocoding is done with sufficient accuracy, it will give sufficient geographic accuracy to direct precision guided weapons against target points selected from the sensor image.
A similar application involves automated guidance of a robot, such as the robot arm of the Space Shuttle or any remote manipulator, or an autonomous mobile vehicle, where a precise goal point or desired location point for motion is identified in a sensor image presenting a perspective view of the scene, but the goal point's precise three dimensional location must be obtained in the scene area imaged by the sensor. Such precise knowledge of location could be used to navigate the robot arm, or an autonomous mobile vehicle, to arrive precisely at the desired goal point. The precise knowledge could also be used to remotely measure locations, distances, shapes, or other two or three dimensional geometric properties in the scene area. If the perspective image is geocoded using an associated three dimensional model of the scene area, such as a computer aided design model, it is possible to determine precisely the desired point location in the scene area, in coordinates of the scene model.
A perspective image of a three dimensional scene presents a two dimensional depiction of that scene as observed from the viewpoint of the sensor. While each pixel in the perspective image depicts the scene content at a single location in the scene area, the projection means discards one dimension of information as it operates, and a two dimensional perspective image does not contain enough information to reconstruct the three dimensional information. Given the projection means, such as a simple frame camera model with parameters describing the camera properties and its location and orientation when the perspective image was collected, it is only possible to associate a line in three dimensions, from any pixel in the perspective scene, projected back through the scene. It is not possible, however, to identify a single, specific point on that line to be identified as the scene location which is depicted in that pixel.
Using an auxiliary three dimensional model of the scene area, it is possible to determine the three dimensional points associated with the pixels in a two dimensional perspective image, although the process has been difficult and slow. By using a three dimensional model of the scene area, it is possible to perform an intersection of each light ray, projected from each pixel in the perspective image, through the camera model, and projecting onwards through the three dimensional model of the scene area. Such ray projection must be performed for each pixel in the perspective image to fully geocode the image. This intersection is an extremely difficult process, as it must account for multiple possible intersections of each ray with the three dimensional elevation model or scene model. For a digital elevation model, a search is conducted along a path through the rows and columns of the digital elevation model, beneath the line of the ray, checking at each row and column location on that path if the ray intersects the elevation model surface at the point above that point on the path. This entails a calculation to determine if the line of the ray projection passes the geographic location associated with that row and column in the digital elevation model at an altitude sufficiently close to the elevation given at that row and column location in the digital elevation model. By using a search that extends in direction from the sensor location and towards the scene, the first intersection of the ray with the scene area elevation model surface indicates the closest visible location in the scene area. The three dimensional location at the closest point is the location to be associated with the scene element depicted in that pixel. There may be thousands of points along a single ray that need to be tested for intersection, and a ray search is needed for each pixel in the perspective image to be geocoded.
For a three dimensional surface model, such as a facet model used in computer graphics representations of three dimensional scenes, an intersection must be determined between the ray and all surface facets in the model. The facet which intersects the ray at the closest distance from the sensor location along the ray, and which is visible from the point of view of the sensor, gives the three dimensional location to associate with the corresponding perspective image pixel. A typical facet model may contain millions of facets, so the search with each ray for facet intersections is very expensive. A similar difficulty occurs with a volume model of the scene, and the volume model may first have to be reduced to a collection of surface facets which are then treated as a facet model of the scene area.
The OpenGL projection model (for example the “Woo 99” model disclosed in “Open GL Programming Guide, 3rd Ed.,” Woo, Mason; J. Neider; T. David; D. Shreiner; Addison Wesley, Boston 1999) contains a simple means to associate three dimensional location information for a scene area, with a two dimensional image projection of that area, but can only be applied when the OpenGL projection model is employed. Many projection operators, including that in OpenGL, actually transform three dimensional coordinates into three dimensional coordinates, with two of the resulting coordinates immediately useful as row and column pixel locations in the perspective image. The third result coordinate, typically called the Z coordinate, is discarded in constructing a perspective image. However, implementations of OpenGL retain the Z coordinate for each projected pixel of the perspective image, and allow an inverse projection function, called “unproject”, that calculates from the two dimensional projected pixel location in the perspective image, making use of the third or Z coordinate value retained for that pixel location, to produce an approximation of the three dimensional location from which the projection to that perspective pixel was obtained.
This projection method, which operates to produce for each projected location a corresponding three dimensional scene location requires that OpenGL be used to produce the projected image. The projected image is not geocoded by this means. However, if the reference elevation model used to provide the three dimensional data from which the projection is constructed is geocoded, then the unproject means can provide a geocoding for the perspective image.
The OpenGL projection model is computationally demanding. It operates by transforming three dimensional coordinates from the scene area model into two dimensional, perspective image, projection coordinates, with the hidden but retained Z coordinate. The OpenGL projection assumes a facet model of the scene area, and performs visibility checks and ray intersection for each facet in the scene area model. Each facet is separately processed. Facets are checked for visibility by examining their bounding corner points, and the orientation of their visible face. Facets not visible by location, or by orientation away from the viewing location, are skipped. All facets which are at least partly visible are processed. This leads to a requirement for a major computational effort, normally provided by implementing OpenGL in a specialized hardware graphics adaptor.
The processing maps each three dimensional coordinate at the corners of the visible part of a facet into the perspective image, defining a two dimensional facet in the perspective projection space to be filled with the picture contents assigned to the scene model facet. A Z-buffer technique is used, whereby each pixel can only be assigned to a part of the facet picture content if its Z value is smaller than any previously saved Z value for that pixel location. If a pixel passes the Z value test, its value is assigned to the facet pixel, which is also assigned the Z value for that part of the facet providing the pixel value. Since small Z values represent portions of the scene closer to the sensor, occluded surfaces will be occluded by being overwritten with pixels which are given content from a facet producing a closer, smaller Z value, or by not being written because the closer pixel was written into the image first.
There are many applications where use of the OpenGL projection model, or an OpenGL hardware graphics accelerator, or even an OpenGL software simulation are not possible, or not desirable. Applications that require small computers, such as a wearable computer, or computers in specialized environments, such as in fighter aircraft equipment bays or on Space Shuttle robot arms, are particular examples where a standard graphics capability may not be feasible to implement. Electrical power requirements may prevent use of a graphics adaptor, because of their usually large power and heat demands, arising from the high-speed processors they employ.