1. Field of the Invention
The present invention relates to Boolean set operations on data sets and, more particularly, to an improved method and software process for Boolean set intersection and set union operations among polygon-represented data sets using a digital computer.
2. Description of Related Art
Boolean set operation generation is a foundational computational capability in a wide range of diverse problem domains, including image processing, spatial data analysis, constraint-based reasoning, earth resource evaluation, crop management, market analysis studies, micro-fabrication, mining, weather forecasting, military planning, and utility management. For example, rain forest shrinkage over time can be studied by performing Boolean set intersections between processed earth resource imagery and historical vector-represented geo-spatial map products depicting vegetation. The determination of regions “within 100 miles of Oklahoma City possessing a slope less than 3 degrees where wheat is grown” can be determined by performing Boolean set intersections between a circular region with radius 100 miles about Oklahoma City, a map product depicting slopes, and a land use product depicting agricultural crops. Ground-based target locations for military applications can often be significantly refined by intersecting sensor generated error ellipses with domain features that favor the presence of such vehicles (roads, high mobility regions, and regions that afford cover and concealment), while de-emphasizing regions that do not favor such vehicles (swamps and waterways). Each of these examples relies on Boolean set operation computation among potentially large 2-D spatially-organized data sets. All Geographic Information Systems (GIS) and modern database management systems (DBMS) provide direct support to two-dimensional Boolean set operation generation.
For polygon-represented data sets, the key Boolean set operations of intersection and union have been typically generated using algorithms from a field of mathematics known as computational geometry. Due to the highly combinatorial nature of these methods, computational requirements tend to increase rapidly as a function of the “size” of the data sets (i.e., the number of vertices in the polygon representations). Further dramatic increases in computational requirements occur when the polygons are multiply-connected (e.g., polygons with included holes or regions that consist of a collection of non-intersecting smaller regions). As a direct result of the inherent combinatorial computational requirements, large-scale dynamic applications have tended to be impractical. Even highly parallel implementations, as well as significant improvements in hardware performance cannot effectively overcome the staggering computational requirements of some prospective applications.
Traditionally, two-dimensional objects could be represented by one of two generally accepted spatial data structures: either the region quadtree or tuple-based polygon representations. The region quadtree spatial data structure provides a regular, recursive decomposition of 2-D space. Quadtrees are used in the construction of some multidimensional databases (e.g., cartography, computer graphics, and image processing). A quadtree is a means of encoding an image as a tree structure. Each node of the tree has up to four children. The root node represents the entire image; its children represent the four quadrants of the entire image; their children represent the sixteen subquadrants, and so on. Because the region quadtree is an areal-based representation, it supports highly efficient top-down spatial search and data manipulation, as well as areal-based set operation generation.
Despite these benefits, the region quadtree representation exhibits significant shortcomings. Because the “size” of the quadtree representation increases by a factor of four for each level of decomposition, excessively large storage requirements (and concomitant computational requirements) can occur for applications that demand high fidelity data representation. Tuple-based representations, on the other hand, provide memory-efficient, high fidelity representations of point, line, and region features. However, because such representations do not directly support spatial search and spatial data manipulation, data analysis tends to lead to highly combinatorial computational requirements.
Thus, both tuple-based and region quadtree-based spatial data representations possess strengths, as well as obvious weaknesses. Spatial search, reasoning, and set generation that tends to be straightforward and efficient for quadtree-based representations, but lead to computationally-intensive algorithms for tuple-based representations. Data storage requirements to support high fidelity data sets are readily met with tuple-base representations, while direct use of region quadtree representations leads to excessively large storage and associated data manipulation requirements.
There are few prior efforts at more efficient set operation generation procedures. One example, U.S. Pat. No. 5,649,084, discloses a method for performing Boolean operations on geometric objects in a computer-aided design system. Edges of a first object are intersected with surfaces of a second object to produce intersection points, and surfaces containing the faces of the two objects, respectively, are intersected with each other to produce intersection tracks. If there is an inconsistency between the intersection points and corresponding intersection tracks, a perturbation step is applied to correct the spatial positions of inconsistent intersection points. This maintains geometric consistency and avoids overly complex set operations. However, it also changes the original objects and sacrifices accuracy.
Rather than seeking incremental enhancements over traditional approaches, in the early 1980s the present inventor pioneered an entirely new data representation structure referred to as the region quadtree-indexed vector representation. See, Antony, R. and Emmerman, P., Spatial Reasoning and Knowledge Representation, Geographic Information Systems in the Government Workshop Proceedings, A. Deepak Publishing (1986). This structure helps to dramatically reduce the computational requirements for set operation generation among both simple and multiply-connected polygon-represented data sets. The quadtree-indexed vector representation is, in effect, a hybrid data structure that combines the benefits of a hierarchical, multiple resolution grid-based spatial representation (the region quadtree data structure) with the accuracy and storage efficiency of tuple-based (polygon boundary) representations. In the quadtree-indexed vector representation, the region quadtree serves not only as a spatial indexing mechanism into memory-efficient, high fidelity representations of points, lines, and region boundaries, but as a multiple resolution spatial representation in its own right.
FIG. 1 illustrates the relationship between an indexing cell(s) and the underlying tuple-based data. Note that for point and line (commonly referred to as polyline) features, every quadtree cell in the representation “indexes” tuple-based data. For regions, however, there exist two classes of quadtree indexing cells: those that index interior cells and those that index boundary cells. Interior cells are quadtree cells that are fully included within a region; boundary cells are partly within and partly outside a region. Thus, for simply-connected regions, only quadtree boundary cells associated with the region's boundary directly index tuple-based data.
For polygons that possess one or more holes as shown in FIG. 2, tuple lists define not only the outer boundary of the polygon, but also the boundaries of all included holes. The tuple list of a hole serves both as the description of the edge of the hole, as well as a description of an inner edge of the region. The “direction” associated with the tuple list of a first-order hole is opposite that of the outer boundary list convention. For embedded holes, each subsequent tuple list is order-reversed to maintain logical consistency with respect to the “inside” and “outside” of the region.
FIG. 3 illustrates tuple-represented piece-wise continuous line feature representing a region boundary along with the associated quadtree-indexing cells. This figure shows the association between line segment vertex tuples and the quadtree-indexing cell (actual data points are shown as filled circles; pseudo points are shown as open circles). The outer square in the figure represents an indexing cell at some arbitrary level of decomposition. Data points 2, 3, 4, 5, and 10 are seen to lie within the cell. The quadtree-indexed vector spatial data structure requires the addition of pseudo points at all cell entrance and exit boundaries. Referring back to FIG. 3, four pseudo points are associated with this particular cell: points 1, 6, 9, and 11. If represented at the next higher resolution quadtree grid size (indicated by the four smaller squares in FIG. 3), additional pseudo points are required at the boundaries of the smaller indexing cells. For example, the upper right indexing cell consists of the entrance tuple pseudo point 3′, the data tuples 4 and 5, and the exit tuple pseudo point 6. Because one or more data tuples could fall on a cell boundary (a data tuple actually lies at the cell boundary or a line segment follows the cell boundary), the implementation must distinguish pseudo points from the original tuple list. Adjacent cells maintain duplicate shared cell boundary tuples. The lower right and the upper right cells, for example, share tuple 3′.
In addition to distinguishing between data point tuples and indexing cell boundary pseudo points, the implementation must distinguish between interior (non-boundary) indexing cells and boundary indexing cells.
FIGS. 4(a)-(c) shows the interaction between two regions, and FIGS. 4(d)-(f) shows the interaction between the region's respective quadtree grids. Unless two regions share at least one common quadtree indexing cell, no direct interaction between the two data sets occur. When no common cells occur, set intersection is null and set union is the set formed by all the cells in the two disjoint regions. However, when common cells do occur, the set intersection is as seen in FIG. 4(f).
FIG. 5 illustrates a categorical approach for analyzing set operations that was developed by the inventor herein and is described in Antony, R., Principles of Data Fusion Automation (Artech House, 1995). When set intersection exists, set operation generation is treated as a three-stage (sub-problem) process involving the following canonical form classes.
Class 1: interactions between two interior cells;
Class 2: interactions between a boundary and an interior cell; and
Class 3: interactions between two boundary cells.
The first two stages entail only relatively trivial computations, and the appropriate methods for generating the products from both Class 1 and Class 2 are described in the Antony treatise, supra at 95. The third stage, however, is considerably more involved and effectively controls the computational complexity of the set operation generation process. Because both polylines and points can be treated as degenerate regions (regions with no “interior cells”), set union and intersection operations for these lower order features can be treated as a special case of the region generation methodology.
Consequently, it would be greatly advantageous to provide an optimal method for generating the product of the third, and key stage of the procedure: Class 3 interactions between two boundary cells.