1. Field of the Invention
The present invention is directed to the field of indexing computer-based multidimensional data. It is more particularly directed to reducing data collection used in the determination of the grid cell size when grid-indexing techniques are applied to multidimensional data on a computer system.
2. Description of the Background Art
Indexing techniques are used to quickly access data that is sorted. Spatial data is typically information associated with geometric shapes such as lines, points, poly-lines, polygons, and surfaces. Spatial data is often very large and may have two, three, or more dimensions. Spatial data may be indexed. Indexing such data by traditional techniques, such as a B-tree, may not be feasible due to the large amount of computer resources required to index spatial data. Further, B-tree indexing is typically associated with single-dimensional data, not multidimensional data. Therefore, sorting capabilities associated with B-tree indexing are typically not sufficient to be efficiently applied to multidimensional data. To reduce data processing time, various spatial indexing techniques have been studied and developed. Grid indexing is one of these indexing techniques associated with searching spatial multidimensional data, and is used by the product marketed under the trademark IBM DB2® Spatial Extender.
The grid cell size used in grid indexing strongly affects the efficiency of accessing spatial data by techniques that employ grid indexing. A problem has been to refine the determination of particular grid cell sizes and thereby reduce the overhead associated with searching a spatial data set via grid indexing over techniques of the past. More particularly, a problem has been to reduce the amount of data that results from the sampling that occurs during statistics collection. Such data is used to determine the proper grid cell size.
An optimal relationship between a geometric shape and a grid cell is a one-to-one relationship in which each geometric shape overlaps only one grid cell, and each grid cell includes at most one geometric shape. This optimal relationship simplifies searching for a particular geometric shape by simplifying the process of sorting and accessing spatial data via grid indexing. By means of an example, if the grid cell size is too large, many geometric shapes may overlap with one grid cell and identification of a particular geometric shape is difficult due to the lack of a one-to-one association between a grid cell and a geometric shape. On the other hand, if the grid cell size is too small then a geometric shape overlaps many grid cells and it becomes quite difficult to quickly access the geometric shape by spatial indexing. Those skilled in the art will appreciate the technique of accessing spatial data by determining overlap of a geometric shape with a grid cell.
A geometric shape that is typically the subject of spatial data may be approximated by a rectangle. When a rectangle bounds the geometric shape with a minimum enclosure, it is referred to as a “minimum-bounding rectangle.” When a minimum-bounding rectangle has been defined and approximates a geometric shape that is located in space, coordinates located on a grid that represent the location of the minimum-bounding rectangle may be used to reference the minimum-bounding rectangle and the approximated geometric shape. For example, the coordinates on a grid that correspond to the corners of the minimum-bounding rectangle may be stored and used to reference the minimum-bounding rectangle.
An index enables fast access of a certain subset of data contained in a larger set of data. The index comprises a data structure and the techniques used to build, maintain, and search the data structure for the purpose of accessing a subset of data. For example, an index may define a data structure that is used to access a specific geometric shape included in a set of spatial data. The particular index of the present example may define a data structure that contains references to the minimum-bounding rectangles associated with various geometric shapes in a spatial data set. By accessing locator references associated with the minimum-bounding rectangles the process of accessing particular geometric shapes in a spatial data set is simplified.
Techniques of the past have typically required significant resources to locate a geometric shape in a spatial database. The lack of an efficient process for determining an index that facilitates streamlined location of minimum-bounding rectangles, and the associated geometric shapes, has contributed to inefficient access of information in spatial databases with grid indexing. More particularly, a problem has been to minimize the amount of data that is processed to determine an efficient grid cell size. That is, there exists a need to reduce the amount of data that results from sampling during statistics collections that are used to determine an efficient grid cell size so that the technique of grid indexing that locates a particular minimum-bounding rectangle is sufficiently efficient. From the foregoing it will be apparent that there is still a need to improve the determination of the grid cell size when grid-indexing techniques are applied to spatial data on a computer system.