Commercial satellite imagery is steadily becoming more accessible in terms of ground coverage, distribution, and cost. A key aspect the latest commercial satellite imagery is the large area, high-resolution coverage contained in a single image acquisition. This broad area of coverage, when combined with previously unobtainable resolution, allows the imagery to be used for analysis in new and powerful ways, including: emergency response, urban planning, and damage assessment. Such imagery can also be used in more traditional methods such as environmental assessment, but at a much finer level of detail than previously achievable. As computer processing and display capabilities improve, these satellite images can be quickly geo-registered or orthorectified, and then mosaicked into even larger areas of coverage.
A similar situation exists for airborne imagery that is captured on film and then digitized, or imagery that is directly captured in digital form using new large format digital cameras. In both cases the imagery is usually rectified and mosaicked to provide larger areas of coverage.
Imagery mosaics provide large area coverage by combining several adjacent, partially overlapping images in a continuous, and often seamless, presentation. As the images may have been taken at different times, the tone of the individual images is adjusted and then balanced across the mosaic. If the output of this process is written to a single image file, the result is referred to as a true mosaic.
An alternative concept, possible only in softcopy application, is to form an image mosaic for presentation, but never to form the single aggregate image, either in memory or written to a file. This process is referred to as forming a “virtual” mosaic. When viewing a virtual mosaic, the image processing system computes the extent of the image view based on the geographic location and magnification factor selected by the user. Only those pixels required to fill the image view are processed. In the interest of processing efficiency, virtual mosaics are typically not seamless. Image overlap is retained and in overlap areas one of the images is viewed as the default (i.e., on top of the stack). The user may choose a different image to be viewed on top, or the images can be blended in the overlap area.
The advantage of the virtual mosaic approach is that while the mosaic may include tens or hundreds of images (involving hundreds of megabytes of data), the system only has to address and process those pixels required to fill the selected view. On most image processing systems this is limited to a few megabytes of data. The virtual mosaic approach greatly reduces the processing and memory demands on the image processing system. It also reduces storage requirements in that the full mosaic file is never formed (which, in general, is close in size to the sum of all the component images), yielding an approximately fifty percent savings in storage.
Another advantage of the virtual mosaic concept is improved performance when “roaming” the image view across the coverage area. Roaming is typically used in image review and in area search operations, e.g., looking for specific objects in mosaic area. While many image processing systems have been optimized to roam well across a single image, they do not perform well when presented with a traditional mosaic. The file is simply too large and overwhelms the system memory and processor capabilities. The virtual mosaic approach alleviates these issues by only accessing the data required to fill the instantaneous view.
A necessary step to forming either a traditional or virtual mosaic is to perform some level of geo-registration on the images involved. This process places the images in their proper geographic position with respect to some projection space, enabling the mosaic process, and allows them to be presented in a desired orientation on the viewing screen, typically north-is-up. Geo-registration may be as simple as using geographic coordinates of 2 or more image corners to place the images in the projection space. Images are often provided with metadata that may include polynomial coefficients that allow the imagery to be warped into the desired projection space. A level of accuracy above this would be to place each image pixel using an explicit sensor projective model accounting for the particular acquisition geometry and assuming a flat terrain surface. The highest level of accuracy is achieved via a process called orthorectification. This process uses a projective model to place each pixel while correcting for all known sources of geometric error including terrain variation. The correction for terrain variation is accomplished by projecting each pixel to a model of the terrain surface. The terrain model is typically a matrix of regularly spaced elevation points (or posts). Some form of interpolation is used to determine the proper terrain elevation value when the pixel falls between the posts. Terrain models are produced at various sampling densities, e.g. 100 m spacing or 10 m spacing between posts. A terrain model that does not include man made objects or tall vegetation is referred to as a “bare-earth” model. More accurate, and more expensive, models include elevation information for buildings, bridges, overpasses, etc.
In the interests of cost and computation efficiency, traditional mosaics are usually made with imagery that has been orthorectified to a bare-earth terrain model. This results in errors in areas of varying elevation and in urban areas. The image seams must usually be placed in areas of locally flat terrain. The virtual mosaic process, which does not remove seams, typically uses a polynomial approach to place the images in the desired projection space.
An important issue when searching or roaming through an image mosaic is to maintain geographic context for the user. The geo-registration process described above allows the images to be presented in a consistent projection space, regardless of the specific acquisition conditions for each image. The geo-registration process takes out image differences such as scale and look azimuth.
The task of maintaining geographic context is further supported by simultaneous use of maps and other cartographic data. Raster maps, vector data, cadastral data, and point data are all used to assist the user in understanding the imaged scene. Raster maps are typically standard map sheets that have been scanned and digitized, as an image would be, resulting in a raster file format. Vector maps are used to represent data that is primarily linear in form such as roads, rail lines, and power lines. Vector data stores the vertices of the linear segments and in some cases associated attribute data, rather than a raster “image” of the feature, and therefore requires much less data storage to represent a typical feature. Cadastral data refers to ownership maps and can be in either raster or vector format. Point information such as cell tower location is stored in vector form.
Map data may be presented in the same display window as the imagery, in a separate display window, or on a completely separate display. In the first case, the map data may be used as a background layer for the imagery, blended in with the imagery, or presented in a flicker mode with the imagery. When presented in a separate window or display, it is common to have some form of real time linkage between the map and imagery windows, so that as the user moves in one window, the equivalent position is automatically indicated in the other. In each of these cases, the objective is to provide the user with an easily understood geographic reference that can assist the interpretation task.
The imagery interpretation task can be performed entirely by the user through visual inspection and analysis, or with some level of automation provided by the image processing system. One such process is change detection. Change detection refers to the process of comparing imagery over an area of interest taken at two different times. Images are compared either manually or automatically to determine those places where some change in the scene content has occurred. Imagery based change detection can be performed on a variety of image types including panchromatic, color, IR (infrared) and multi-spectral. Change detection can be performed at a number of “levels”.
The simplest form is performed by a human analyst by comparing the before and after images, usually in some form of alternating presentation between the two images. The alternating presentation may be “flicker” mode wherein each image is presented alternatively for a few seconds (or some portion of a second) each. Other methods include: fading the images from a full presentation of the first to a full presentation of the second—this is sometimes referred to as a blend; swipe or wiping, wherein one image is incrementally replaced by the other in a wiping motion across the image format (the motion can be horizontal or vertical).
The most common automated method is gray-scale based change detection wherein the pixel values of the registered before and after images are compared at each location using a simple subtraction method. In U.S. Pat. No. 6,163,620, an improvement over this simple subtraction method is discussed wherein a search is performed for the best “match” at each location before the subtraction is performed. The result is a method that is notably resistant to registration errors between the two images. Preparatory techniques such as histogram equalization can help improve the result of any gray-scale based approach. Overall, this level of change detection provides results indicating that some form of change has likely occurred in a particular spot.
A “higher” level of change detection is based on image features. In this context features may be: entities that can be computed from the image pixels such as edges and textures (no understanding of the edge or texture implied); presumed man made objects such as roads, edges of fields, etc.; multi-spectral features such as computed band ratios [e.g. (band1−band2)/(band3)] where the feature may or may not have physical meaning. Overall, this level of change detection provides a result indicating that some form of man made change has likely occurred in a particular spot, or perhaps that the ground cover in a particular spot has changed from one material to another.
An even “higher” level of change detection is performed using 3-dimensional analysis of the imaged scene to determine change. This approach assumes that a 3-dimensional model of the scene, particularly of man-made objects such as buildings, exists prior to analysis. This model is compared to 3-dimensional data extracted from a recent image, or images in the case of stereo acquisitions. This level of change detection can provide a result indicating for example that a certain building has changed in size.
Example applications of change detection include: environmental assessment, ascertaining crop health, determining the presence of certain species of flora or fauna, monitoring encroachment of human activities on utility right-of-ways, pollution detection (water turbidity, dumping activities).
A frequent goal of such investigations is to determine not only where a problem exists, but also when the problem first became apparent. In this case the analyst is attempting to build a historical record of the “event”, and the challenge is to conduct the search while maintaining context spatially—“Where is this?”—and temporally—“When is this?’