In pattern recognition and remote sensing, object recognition is typically performed by gradually accumulating evidence from various sensors and from components within individual sensors. The first approach is usually referred to as intra-sensor integration, whereas the latter is referred to as inter-sensor integration. Information extraction from multi-sensor images is usually a more difficult procedure than is information extraction from a single sensor image simply because multi-sensor images are almost never spatially registered. Thus, the first step to multi-sensor integration is typically to physically register the images as completely as possible.
Multi-sensor coverage of an area can be illustrated, for purposes of discussion, using the New River Inlet Quadrangle of North Carolina. The imagery of the finest resolution is Digital Orthophoto Quad (DOQ) or Digital Orthophoto Quarter Quad (DOQQ) imagery generated by the United States Geological Survey (USGS). DOQ or DOQQ imagery has a spatial resolution level of approximately one meter/pixel. Another image type similar to DOQ imagery is called controlled image base 5 meter (CIB-5) which typically has a spatial resolution level of approximately 5 meters/pixel. Both DOQ and CIB-5 images are referred to as ortho-rectified or orthoimages the entire image surface has a constant scale.
The IKONOS-2 satellite was launched in September 1999 and has been delivering commercial data since early 2000. IKONOS is the first of a new generation of high spatial resolution satellites. IKONOS data records four channels of multispectral data at 4 meter resolution and one panchromatic channel at 1 meter resolution. This means that IKONOS is the first commercial satellite to deliver near photographic, high resolution satellite imagery of terrain anywhere in the world. IKONOS data is collected as 11 bits per pixel (2048 gray tones). Bands, wavelengths, and resolutions of the IKONOS data are:
Panchromatic0.45-0.90 μm1 meterBand 10.45-0.53 μm (blue)4 metersBand 20.52-0.61 μm (green)4 metersBand 30.64-0.72 μm (red)4 metersBand 40.77-0.88 μm (near infra-red)4 meters
The spatial resolution of the IKONOS imagery of the New River Inlet area is between about 1.1 meter/pixel and 1.4 meter/pixel. Resolution of the pixels along the x-axis is different from the pixel resolution along the y-axis in the IKONOS imagery. That means that these pixels are rectangular, not square.
Coarser-resolution satellite imagery for the test area in addition to IKONOS includes SPOT pan with a resolution of about approximately 10 m/pixel, and LANDSAT™ (enhanced thematic mapper) a resolution level of about 30 m/pixel. Like IKONOS imagery, the SPOT pan and LANDSAT imagery of the test site both also contain rectangular pixels.
In image processing and remote sensing analyses, it is generally assumed that the shape of the pixels in the imagery under investigation is square. However, for the above-noted test site, only DOQ and CIB-5 have square pixels; the rest of the images have rectangular pixels. Since IKONOS, SPOT and LANDSAT images are much more popular than are DOQ and CIB-5 images, it is surprising that most remote sensing researchers the world over have assumed these satellite images have square pixels. In reality these square pixels have never existed in most circumstances since 1972 when the first LANDSAT (originally called Earth Resources Technological Satellite (ERTS)) satellite was launched. The multispectral scanner (MSS) imagery had a spatial resolution of about 80 meters/pixel. With an orbiting altitude of about 500 miles and a spatial resolution of 80 meters, there was little concern about whether the ERTS pixels were square or rectangular.
With the introduction of DOQ in the 1990s, however, square pixels at the resolution level of one meter/pixel were available. Not much later, IKONOS imagery became available. It was generally believed that IKONOS imagery had a spatial resolution of 1 meter/pixel in the panchromatic domain, and 4 meters/pixel in the multispectral domain. For the most part, image users have not considered the reality of the shape of the pixels of the IKONOS imagery. Most people observing the images assume that a square pixel is the norm, and that DOQ can provide such pixels. The existence of non-square pixels among dissimilar imagery types, however, is one of the critical reasons why mismatching exists among multi-sensor images. Therefore, the first step in multi-sensor integration is to understand the causes and effects of non-square pixels over the entire image surface.
Spatial mismatch among multispectral images may result from dissimilarities in their resolution levels, their location or translation, their depression angles, their orientations, and combinations of these factors. Conventional methods of image registration include ortho-rectification and a number of image warping approaches (for example, nearest neighbor and rubber sheeting methods) as well as variations of these methods. Such methods are known to those of skill in the art.
An ortho-rectified image has a constant scale over the entire image surface, both vertically and horizontally, as if it were a map. Thus, when all of the input images are ortho-rectified, each of them has a constant scale, and each column of the image is oriented toward the north. Under these conditions, multi-sensor image registration becomes an easy task, that of changing the scale of one image to match the scale of the other image. In addition, a common pixel must be located in each image which becomes the very NW corner pixel in each image. The problem of using ortho-rectification as a means of image registration stems from the fact that ortho-rectification by itself is a very complex process. Consequently, in many cases, it cannot be assumed that ortho-rectified imagery is a natural product of any particular sensing system.
From the most commonly-used image processing software, such as Earth Resources Data Analysis System (ERDAS) and the Environment for Visualization Images (ENVI), one can conclude that conventional image registration approaches are centered on manually locating a few control points in the to-be-registered imagery, and then selecting one of the methods provided by the software package used to perform image registration. (ENVY User Guide, Research System, Inc. 1996 Edition, pp. 10-1 to 10-16).
In its simplest form, image registration begins with matching the selected control points (i.e., pixels). Once the control point pixels are relocated to match the identical control points in another image, all the rest of the pixels must somehow be moved to create one or more new images. The rules by which the pixels in each image to be registered are moved in this context constitute an image registration algorithm. While computational procedures for image registration are computerized, the steps that the image analyst must follow are usually both time-consuming and tedious. Moreover, (originally called Earth Resources Technological Satellite (ERTS)) the spectral values of the original scenes may become altered after image registration.
A comparative analysis between ortho-rectification and conventional image registration methods provided by commonly used image processing software reveals that the superiority of ortho-rectification lies in its ability to generalize over a very large geographic region. Conventional approaches using manually-selected control points at a localized region, on the other hand, are usually not generalizable beyond the area covered by the selected control points. This also means that in order to cover a large geographic area with repetitive actions of control point section, the conventional image registration process is labor intensive. While ortho-rectification is capable of generalizing over a very large geographic region for image registration, the process of ortho-rectification itself is so complex that it usually requires a thorough understanding of the camera model and, in addition, generally requires having digital elevation models of the terrain of the image.
While distinct objects on images from different sources, having different scales, depression angles, and/or orientations, possess identical geospatial (i.e., latitude and longitude or “lat-long”) coordinates, geospatial information is not heavily utilized by conventional image registration approaches. On a flat map, one of the most commonly used counterparts of the lat-long system is the map-based Universal Transverse Mercator (UTM) projection that denotes a particular location using “easting” (in meters) and “northing” (in meters) plus a zone ID. Each UTM zone is defined by 6 degrees in Easting and 8 degrees in Northing. One degree represents about 110,000 meters.
This zone ID-based UTM system works well for point locations since one point location can be easily identified within one particular UTM zone. In an image that covers a large geographic area, the image pixels may fall into two different UTM zones. A case in point is the city of Culpepper, Va., which falls into two UTM zones. That is, East Culpepper DOQQ belongs to UTM zone 18, whereas West Culpepper DOQQ belongs to UTM zone 17. Since the central meridian for UTM zone 17 is different from the central meridian of zone 18, there is a mismatch between the West Culpepper DOQ and the East Culpepper DOQ at the junction of these two DOQs. This mismatch occurs in center of the city.
FIG. 1 clearly shows the mismatch pattern at the junction of two UTM zones. As may readily be seen, one easily-seen feature appears twice, once in each of the UTM zones. Such an occurrence indicates a severe mismatch, at least in the upper region of the images. FIG. 14 shows that possible mismatch due to dissimilar image orientations between two neighboring UTM zones may also exist in digital elevation model (DEM) data that represent the elevation readings of the terrain. Such data may also be denoted as “object height” data.
In this cross-zone scenario (FIGS. 1 and 14), the mathematics used for the conventional UTM system may be inadequate to deal with the pixels in one image, which spans two UTM zones. Similarly, UTM has difficulty dealing with image pixels that are dispersed into both north side and south side of the equator.
Therefore, it is advantageous to define a way to measure the degree to which a given scene departs from an ortho-photo setting. In an ortho-rectified scene, it is necessary only to know one control point and to have the resolution of the pixel in order to determine the distance from a particular pixel location to another pixel location.
It is also advantageous to convert a given scene into a geoscene so that a distinct object in multiple images can possess the same geo-locational values, and the distance between two pixels or ground objects can be measured in terms of the earth's coordinate system, rather than in pixels. Similar methods may be used to quantify areas in images (i.e., to geomasking). In the method of the present invention, a pixel in a geoscene has quintuple representations:                (1) (x,y) coordinates in the image domain;        (2) (z) coordinate in the spectral and/or elevation/height domains        (3) UTM representation in the geospatial domain;        (4) latitude-longitude in the geospatial domain; and        (5) Virtual Transverse Mercator (VTM) representation in the geospatial domain.        
UTM is a map-based representation of latitude and longitude (lat-long) readings. Conversion between lat-long and the UTM system may be obtained using the methods outlined in Wolf and Ghilani, Elementary Surveying: An Introduction To Geomatics (Prentice Hall, 2002). Using the geoscene in accordance with the present invention, pixel (x,y) readings may be converted to UTM or the lat-long equivalent freely in both directions. This means that the distance between two pixels in the geoscene may be measured in terms of the physical distance in meters, on the ground.
Since image pixels may be distributed over multiple UTM zones and/or over both the north and south sides of the equator, it is advantageous to develop a modified UTM system by which the central meridian may be adjusted according to the spatial locations of the pixels instead of fixed locations. Also, the northing readings may be calculated across the equator in a continuous fashion. In the present invention, this is referred to as a Virtual Transverse Mercator (VTM) projection.
It is also advantageous to measure the resolution of a pixel in an image in terms of both x and y axes, or easting and northing in the context of both the UTM and the VTM systems. In the ortho-photo case, the x-resolution equals the y-resolution. In non-ortho-photo cases, however, the x-resolution does not equal the y-resolution.
It is advantageous to determine whether using only the x-resolution and the y-resolution information of two images is adequate to perform scene registration.
It is also advantageous to measure the degree of mismatch that still exists between two scenes by using two control points after the condition of rectangular, non-square pixels are accommodated.
Since the use of rectangular pixels cannot account for all the factors affecting mismatch between two scenes, it is advantageous to develop a method by which two general scenes can be registered by using multiple (e.g., four) control points.
It is equally advantageous to develop an image registration method by which the spectral values of the pixels in the original inputs are not altered after registration. This is important because the spectral signature of a given object is based on the original spectral values of the pixel representing that object. Any deviation from the original spectral value may create errors in subsequent object recognition. Such errors in the DEM domain are intolerable since a change in the terrain elevation due to image processing error can potentially result in severe consequences.
It is also advantageous to develop a new image registration method by which, after image registration, the spectral value of a given pixel at a given image location or geo-location will not be influenced by the spectral value of any neighboring pixel.
It is definitely advantageous to perform a near-perfect ortho-rectification for image registration without having to use a camera model of the sensor and/or the digital elevation data of the imaged area.
It is also advantageous to perform both image registration and image mosaicking simultaneously when multiple images align correctly in the geospatial domain.
It is also advantageous to define the region of intersecting and unioning in terms of a geomask supplied by an independent source, such as a user or a detector. This process is called “geomasking”. The result of geomasking is a set of geoimages having varying resolutions corresponding to the resolutions of the inputted geoimages. Geomosaicking can be performed with the output images of the geomasking process.
It is additionally advantageous to project both the intersecting and unioned sections of the geospatially aligned images and/or pixels into a map domain, such as UTM or VTM.
It is also advantageous to overlay a geoimage with geo-features and/or geo-objects. This process is called “geooverlaying”.
It is further advantageous to overlay geoimages with text to identify relative and/or absolute geolocations on the image.
It is also advantageous to perform measurements within geoimages to obtain accurate object measurements in one-, two-, three-, and four-dimensional spaces.
With a newly created image, be it from georegistration or from a geomosaicking process, it is advantageous to apply a set of geospatially based grid lines disposed at predetermined geospatial units, for example, 500 meters, 1000 meters, etc. Text to indicate either relative or absolute geolocations of the image surface may be included. This process is called “geogriding”.
It is also advantageous to apply a set of reference lines (i.e., geogriding) using rectangular patterns, circular patterns, or lines arranged in other geometric patterns.
It is advantageous to index and visualize the geolocation of the boundary of each spatial component of a geomosaicked image based on the outputs of a geomasking analysis. This process is called “geoindexing”. Geoindexing has three complementary representations: (a) a multicolor geoimage, each color representing the aerial coverage of a particular geomasked image, (2) a text-based index, and (3) database file (dbf) summarizing the geospatial information of each geomasked image, a subset of the geomosaic.
It is also advantageous to devise a means for ordering the colorized overlaying layers to generate the color composite image of the geomosaicked image based on the outputs of a geomasking analysis. In general, the largest geomask subset is to be placed first, and the smallest geomask is the topmost color layer as shown in FIG. 13.
It is advantageous to mix geomasked and non-geomasked geoscenes to perform geoindexing in which the non-geomasked geoimages can service as a background/reference scene.
It is also advantageous to devise a means for coloring the mixed non-geomasked and geomasked geoscenes by using a mixture of graytone and color systems, in which a graytone geoscene can be treated as the background scene.
In DEM data, it is advantageous to visualize the terrain features from varying depression angle and aspect angle in the geospatial domain in which UTM is a special case of VTM. Conventional image draping that combines DEM and spectral based images is used mainly for 3-D visualizations. Consequently, image draping typically does not possess geospatial properties. Since a feature mismatch exists in cross-zone UTM-based images, DEM and terrain feature mismatches exist in conventional 3-D visualization methods with cross-UTM-zone DEM data and spectral images. Therefore, it is advantageous to perform 3-D or 4-D visualization in the VTM domain to mitigate both feature and terrain elevation mismatches.
A terrain contour line connects points (i.e., pixels) of equal elevation value. A contour line, therefore, can be represented by the boundary pixels of a region in a raster image of DEM data. Thus, it is advantageous to generate the geolocations of a terrain contour. It is also advantageous to depict the terrain in terms of a geocontour, a geocontour interval, as well as in a 3-D visualization geoscene. Such depiction yields a 4-D visualization system as shown in FIG. 15.
Physical objects exist on, below, and above a particular contour. Therefore, it is advantageous to visualize objects in the 3-D visualization geoscene by contour intervals, an interval being defined by two contour line values. The resulting geoscene is in reality a 4-D geoscene: x and y defines the datum plane, z0 is the DEM dimension, and z1 is one of the spectral/textural dimensions.
It is also advantageous to provide a means to generate a geo-fly-through geoscene viewed from a particular depression angle and a particular aspect angle in the geospatial domain.
It is also advantageous to provide a geo-fly-through geoscene that possesses terrain features in a selected terrain contour interval in the geospatial domain, in which UTM is a special case of VTM.