1.1 Field of the Invention
The present invention pertains to identifying objects in multi-dimensional imagery data and, more particularly, estimating the ground plane in multi-dimensional imagery data.
2.1 Acquisition of Multi-Dimensional Imagery Data
Multi-dimensional imagery data is an electronic picture, i.e., image, of a scene. Multi-dimensional data may be acquired in numerous ways. Laser Detection And Ranging (xe2x80x9cLADARxe2x80x9d) systems are commonly employed for this purpose. Referring to FIG. 2, laser signals are transmitted from a platform 18 onto a scene, e.g., a scanned field of view. Upon encountering object(s) 12 and surrounding environment 14, varying degrees of the transmitted laser signals, characteristic of the particular scene or portion thereof, are reflected back to and detected by a sensor on the platform 18. The platform 18 can then process the reflected signals to obtain multi-dimensional data regarding the object 12 causing the reflection. The multi-dimensional data captures the distance between the object 12 and the platform 18, i.e., range, as well as a number of features of the object 18 such as its height, length, width, average height, etc. The quality and accuracy of the features depends in large part on the conditions prevailing at the time the data is collected, including the orientation of the object relative to the platform (e.g., aspect and depression angles), obscurations, and pixel resolution.
The object 12 may be either airborne or, as shown in FIG. 2, on the ground 16. LADAR data is generally acquired by scanning the field of view to generate rows and columns of discrete units of information known as xe2x80x9cpixels.xe2x80x9d Pixels are used to generate a two-dimensional xe2x80x9cimagexe2x80x9d of the scanned field of view and are correlated to the third dimension, range information. Data acquisition, and particularly LADAR data acquisition is well known in the art and any suitable technique may be employed. Suitable techniques are disclosed and claimed in U.S. Pat. Nos. 5,200,606; 5,224,109; 5,285,461; and 5,701,326.
Since platform 18 typically transmits many laser signals across a general area that may contain one or more objects reflecting the laser signals, it is necessary to examine the reflected data to determine if any objects 12 are present and if so, determine which reflecting objects 12 might be of interest. Automatic target recognition (xe2x80x9cATRxe2x80x9d) systems are used to identify objects 12 represented in multi-dimensional data to determine whether they are potential targets. ATR systems are often divided into four subsystems: object detection, object segmentation, feature extraction, and object identification.
Object identification is the final process which takes inputs such as the object features discussed above and establishes an identity for the object based on comparison(s) to features of known objects. The accuracy of the identification depends on several factors including the correctness of the object features used in the comparison and the number of known objects constituting potential identifications.
Feature extraction selects one or more features of object 18, such as its height, width, length, average length, etc., from the multi-dimensional imagery data. However, preceding identification and extraction, object 18 must first be detected and segmented from the environment 14 as portrayed in the multi-dimensional imagery data. This means that the accuracy of detection and segmentation directly influences the accuracy of extraction and identification.
Object detection is essentially the first sweep through the imagery data. It searches for the presence of one or more objects by processing the image data. The imagery data includes pixel information having either x, y or x, y, z coordinates in multi-dimensional space. Pixel coordinates x, y, represent vertical and horizontal position while the z coordinate represents the range, or depth, of a particular point or area in the scene relative to the platform 18.
The term xe2x80x9cpixelxe2x80x9d is derived from the phrase xe2x80x9cpicture element.xe2x80x9d A picture (i.e., an image) is a depiction or representation of a scene. Each pixel in the array of pixels which combine to create a picture depicts a certain amount of space in the scene.
Traditional object detection is generally accomplished by locating pixels with variances in coordinates, relative to other pixels, exceeding predefined thresholds. Common detection methods search for object boundaries, object features, or some combination thereof.
An illustrative method of detection entails the analysis of pixel coordinate data relative to linearly adjacent pixels. This method is disclosed in my commonly assigned U.S. Patent Application by Arthur S. Bornowski entitled xe2x80x9cImproved Method and Software-Implemented Apparatus for Detecting Objects in Multi-Dimensional Dataxe2x80x9d filed Oct. 22, 1999, Ser. No. 09/426,559 hereby expressly incorporated by reference herein for all purposes as if fully set forth verbatim. The method rejects relatively homogeneously sloped pixels as ground or surroundings. If a nonhomogeneous slope exceeds a specified threshold, then the method analyzes the pixel""s range discontinuity relative to each linearly adjacent pixel. If the range discontinuity, i.e., edge, exceeds a specified threshold the pixel is designates as part of an object. The method continues with each pixel in the multi-dimensional data. This novel method identifies a significant portion of the upper boundary of objects while it rejects relatively homogenous sloping terrain. This method does not sufficiently define the interface between an object and ground. Thus, there is a need to minimize the errors to segmentation and therefore feature extraction and object identification by better estimating the ground plane.
Object segmentation follows the object detection process. The segmentation procedure separates the entirety of the detected object from its surroundings for feature analysis. Detection may not fully delineate the object. Segmentation involves further analysis of the object and surroundings to accurately identify the entire object prior to feature extraction. Ground plane estimation assists in accurate segmentation.
Traditional ground plane estimation relies on both localized and global techniques. These methods typically employ regression techniques of a linear or quadratic fit, using the range as a function of the rows and columns, about the pixels in the approximation. Global techniques assume that the entire scene is flat, and thus all pixels within the scene would be used in the analysis. This technique works well on rather benign scenes but performs poorly on more dynamic scenes. Localized methods, which attempt to estimate a ground plane about an area of interest, perform better on more dynamic scenes.
A significant problem with prior art ground plane estimation methods is inaccurate determination of the object-to-ground interface. This problem can lead to erroneous object segmentation, erroneous feature extraction, erroneous feature comparisons, and ultimately to incorrect or missed object identifications.
The novel detection method leads to more accurate definition of the upper boundaries of an object. However, utilization of traditional ground plane estimation methods would lead to intolerable directional errors in the estimated ground plane. The improved method minimizes this-problem because it locates more object pixels in the traditional ground plane.
However, several new problems are introduced by the illustrative detector, over and above the problems with traditional detectors. Because the improved detector is able to identify more object pixels than conventional detectors, the extra-identified pixels sometimes cause a directional bias in the existing ground plane estimation process.
Accurate segmentation over complex terrain is highly desirable. Complex terrain exacerbates the inaccurate ground/object interface problem. The novel detector accomplishes this in part while improved ground plane estimation is needed to complete the ability to accurately segment objects from complex terrain. The detection operator identifies pixels that form the upper object/ground interface. This allows for a better understanding of the object and, in turn, a better localization of the interface. In other words, the detection operator supplies data regarding the extent of the target in the image plane. Previous methods employed detection operators which only obtained a subset of the object, where the subset could lie anywhere on the object. Consequently, a relatively large segmentation window was required to segment the entire object. The problems associated with that method were exacerbated for objects in complex terrain where multiple xe2x80x9cground planesxe2x80x9d or surfaces may exist. Those multiple surfaces in combination with one, and commonly several, unknown objects make it extremely difficult to extract an optimal ground plane estimate for each and every detected object. Instead, a localization process would allow for a minimal set of ground about each detected object to be operated on during ground estimation.
Yet another problem with prior art detection methods is processor over usage. An ATR system must, as a practical matter, quickly establish the best possible detection with available computing resources. Prior art systems attempting to address the aforementioned difficulties expend valuable resources, computing and otherwise.
In light of the chain reaction consequences of inaccurate ground plane estimation, namely inaccurate segmentation, extraction, and identification, there is a need for an improved method for estimating the ground plane in multi-dimensional imagery data.
The present invention in one embodiment is a method for determining the reference plane in multi-dimensional data. The method includes (a) providing multi-dimensional imagery data, referred to as set A, including an array of pixels having object pixels marked; (b) range gating about at least a subset of the marked object pixels, including marking pixels outside the range gate to form an unmarked pixel subset of set A, referred to as subset B; (c) performing a maximal z density analysis on subset B, including marking pixels outside the maximum density to form an unmarked pixel subset of subset B, referred to as subset C; (d) performing a local normal vector estimate on subset C, including marking pixels having a normal vector exceeding specified threshold L from nominal to form an unmarked pixel subset of subset C, referred to as subset D; (e) performing a first ground plane fit on subset D, each pixel producing residual value X, cumulatively known as residual set X; (f) analyzing residual set X, including performing a residual density analysis and marking pixels whose residual value X exceeds specified threshold M to form an unmarked pixel subset of subset D, referred to a subset E; (g) performing a second ground plane fit on subset E, each pixel producing residual value Y, cumulatively known as residual set Y; (h) analyzing residual set Y, including marking pixels whose residual value Y exceeds specified threshold N to form an unmarked pixel subset of subset E, referred to as subset F; and (i) estimating the reference plane for subset F.