This invention relates to geographic information systems.
Various geographic information systems are known in the art. These systems generally store and manage spatial data such as points, lines, poly-lines, polygons, and surfaces and hence are often referred to as spatial databases. Several commercial database systems that manage spatial data are now available, including: ESRI""s (Environmental Systems Research Institute), ARC/INFO (trademarked), InterGraph""s MGE, MapInfo, and Informix. Query size estimation in spatial databases has been identified as an important problem. An example of a spatial query may be to determine how many rectangles in a spatial database are contained within a rectangular spatial query of a certain size. For example, a query may be to determine how many lakes are within a state. In that case, the lakes are the data rectangles in the spatial database and the rectangular query is the particular state. Similarly one may wish to know how many houses are in a county or how many restaurants are in an area. It may be beneficial to estimate the results of such a query to determine the most efficient way to execute queries generally or to give users estimates of the running times of their queries before the queries are actually executed.
Some query result estimation techniques have been applied to relational databases. A relational database contains non spatial data such as for example numbers, points (points are a special case and may in some cases be classified as spatial data), strings, and dates. These techniques are disclosed in xe2x80x9cBalancing Histogram Optimality and Practicality for Query Result Size Estimationxe2x80x9d, Yannis E. loannidis and Viswanath Poosala, appeared in Proceedings of ACM SIGMOD (Special Interest Group in Management of Data) conference 1995, and use histograms, samples, or are based on parametric techniques. However, relational selectivity estimation solutions focus on approximating single numerical attributes not on two dimensional spatial data.
Generally a bucket is defined as any subset of input spatial data. A spatial input generally can be defined as an input of spatial entities such as rectangles and triangles. Points can be both spatial data and relational data.
The present invention provides various methods and apparatus for providing accurate estimates for point and range queries over two-dimensional spatial data. The present invention provides several grouping techniques for the approximating of spatial data.
In one embodiment of the present invention a method is disclosed for grouping a plurality of spatial inputs into a plurality of buckets also called grouping polygons. These buckets may be stored in a memory by storing their left bottom corner coordinates and their right top corner coordinates (for a rectangular bucket). This provides both the shape of a rectangular bucket and its location. In one form of the present invention the plurality of spatial inputs is grouped based on an equi-area partitioning technique. The equi-area partitioning technique can use the longest dimension of a bucket or bounding polygon as the criteria for splitting into further buckets or bounding polygons. An equi-count technique can also be used wherein the buckets are split using the highest projected spatial input count along a dimension as a splitting criteria. The bounding polygons may be a minimum bounding rectangle.
In one form of the present invention a method is provided which uses a grid of regions superimposed over a plurality of spatial inputs. The processor may achieve superimposition by storing the left bottom corner coordinates and the right top corner coordinates of the each region of the grid of regions in memory and storing the left bottom corner and right top corner coordinates of each spatial input in memory. Superimposition occurs because the coordinates of a spatial input and a region of the grid of regions may be the same. The method preferably determines a measure of the density of the spatial inputs within each region of the grid of regions and uses this measurement of density to determine how to group the spatial inputs into buckets.
When a query is received the present invention applies the query to the buckets created and gives an estimate of the number of spatial inputs contained within the query by preferably assuming that spatial inputs are uniformly distributed within each bucket.