Segmentation, the partitioning of data into related sections or regions, is a key first step in a number of approaches to data analysis and compression. In data analysis, the group of data points contained in each region provides a statistical sampling of data values for more reliable labeling based on data feature values. In data compression, the regions form a basis for compact representation of the data. The quality of the prerequisite data segmentation is a key factor in determining the level of performance of most of these data analysis and compression approaches.
Most segmentation approaches can be placed in one of three categories:                (i) characteristic feature thresholding or clustering,        (ii) boundary detection, or        (iii) region growing.Characteristic feature thresholding or clustering does not exploit spatial information, and thus ignores information that could be used to enhance the segmentation results. While boundary detection does exploit spatial information by examining local edges found throughout the data, it does not necessarily produce closed connected region boundaries. For simple noise-free data, detection of edges usually results in straightforward region boundary delineation. However, edge detection on noisy, complex data often produces missing edges and extra edges that cause the detected boundaries to not necessarily form a set of close connected curves that surround connected regions. Region growing approaches to segmentation are preferred because region growing exploits spatial information and guarantees the formation of closed, connected regions.        
Segmentation is often used in the analysis of imagery data. The techniques described can be applied to image data and to any other data that has spatial characteristics. A data set has spatial characteristics if it can be represented on an n-dimensional grid, and when so represented, data points that are nearer to each other in the grid generally have a higher statistical correlation to each other than data points further away. For remotely sensed images of the earth, an example of a segmentation would be a labeled map that divides the image into areas covered by distinct earth surface covers such as water, snow, types of natural vegetation, types of rock formations, types of agricultural crops and types of other man created development. In unsupervised image segmentation, the labeled map may consist of generic labels such as region 1, region 2, etc., which may be converted to meaningful labels by a post-segmentation analysis. In image analysis, the group of image points contained in each region provides a good statistical sampling of image values for more reliable labeling based on region mean feature values. In addition, the region shape or texture can be analyzed for additional clues to the appropriate labeling of the region.
A segmentation hierarchy is a set of several segmentations of the same data at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. This is useful for applications that require different levels of segmentation detail depending on the particular data objects segmented. A unique feature of a segmentation hierarchy that distinguishes it from most other multilevel representations is that the segment or region boundaries are maintained at the finest data granularity for all levels of the segmentation hierarchy.
In a segmentation hierarchy, an object of interest may be represented by multiple segments in finer levels of detail in the segmentation hierarchy, and may be merged into an encompassing region at coarser levels of detail in the segmentation hierarchy. If the segmentation hierarchy has sufficient resolution, the object of interest will be represented as a single region segment at some intermediate level of segmentation detail. The segmentation hierarchy may be analyzed to identify the hierarchical level at which the object of interest is represented by a single region segment. The object may then be identified through its spectral and region characteristics, such as shape and texture. Additional clues for object identification may be obtained from the behavior of the segmentations at the hierarchical segmentation levels above and below the level at which the object of interest is represented by a single region.
In U.S. Pat. No. 6,895,115, which is incorporated herein by reference, a segmentation approach is described that automatically provides hierarchical segmentations for data at several levels of detail. This approach, called HSEG, is a hybrid of region growing and spectral clustering that produces a hierarchical set of segmentations based on detected natural convergence points. Because of the inclusion of spectral clustering, the HSEG algorithm is very computationally intensive, and cannot be performed in less than a day on moderately sized data sets, even with the most powerful single processor computer currently available. The processing time problem was addressed through a recursive formulation of HSEG, called RHSEG. RHSEG can process moderately sized data sets in a reasonable amount of time on currently available PCs and workstations. Larger data sets required the use of a parallel implementation of RHSEG on a parallel computing system.
However, a problem with the RHSEG algorithm and certain other data processing algorithms that similarly subdivide and subsequently recombine data during processing instant processing artifacts can be introduced by the division and recombination of the data. An example of these processing artifacts can be demonstrated on an 896×896 pixel section of Landsat ETM+ (Enhanced Thematic Mapper) data displayed in FIG. 1. This image was obtained on May 28, 1999 over the southwestern coast of the eastern shore of Maryland. The six non-thermal bands were used in the segmentation tests.
FIG. 2 displays the segmentation of this image into 96 regions as produced by the basic version of RHSEG. Straight-line processing window artifacts from the recursive quartering of the image data are quite evident in sub-regions of the Chesapeake Bay 1,2 and on land in the right center part of the image 3. These straight or blocked lines have no relationship to the data in the image itself, but arose from the recursive subdivision and recombination process. Processing artifacts may be very noticeable as lines, or merely a few mislabeled pixels.
In the prior art, a “contagious clusters” or “contagious regions” concept has been used to attempt to reduce or eliminate the processing window artifacts. The contagious regions concept can be described as follows:
Flag any region that touches a boundary between processing windows and suppress any merging between flagged regions and any other region.
If a non-flagged or “non-contagious” region attempts to merge with a flagged or “contagious” region, the previously non-flagged region becomes flagged or “contagious.”
Thus, the contagious property of the flagged regions is literally contagious. Unfortunately, when the contagious regions concept is applied to the RHSEG algorithm, and when more than two or three levels of recursion are utilized, the RHSEG algorithm is only able to effectively process the data. The RHSEG algorithm effectively stalls because so many regions become contagious that the number of regions in the processing window becomes so large that the processing time required is not sufficiently advantageous over a non-segmented image processing approach.
Indirect mechanisms, however, also exist. For example, increasing the number of regions at which convergence is achieved at intermediate levels of the recursive processing may indirectly cause a reduction in processing window artifacts. A larger value may delay some region merging decisions that would have involved regions on the borders of processing windows to occur after those regions are no longer on the borders of processing windows. This indirect method, however, is inefficient because processing time increases with larger values of the number of regions needed to achieve convergence. Further, processing artifacts are not always eliminated via this method. Other approaches to reducing window artifacts may manipulate other parameters in the recursive hierarchical segmentation processing, but also increase processing time and resources, such that the approaches become impractical for large sets of data.
Indeed, all previously developed techniques for splitting inappropriately merged pixels or processing image data in a fashion to avoid creating window artifacts unacceptably increase the processing time required. Thus, in prior application Ser. No. 10/845,419, a switch-pixels method of addressing window artifacts was disclosed, however, this technique had no mechanism for giving priority to spatial adjacency in switching pixels from one region to another.