There are a virtually unlimited number of contexts in which it is useful to determine which items reside within a certain area. For example, one may wish to determine which heavenly bodies in the known universe are within a certain distance of a particular black hole, which dwellings in the entire world reside within five miles of a certain school, or which circuit items in a large integrated circuit reside in a certain region of a chip. Queries used to search for items, within a particular domain, that satisfy specified location criteria are referred to herein as “spatial” queries.
Spatial queries may be performed by (a) storing location information for each item in the domain, and (b) comparing the location criteria of the query against the location information of each item. Unfortunately, when search domains contain large numbers of items, comparing the location information of each item in the domain to the location criteria of the spatial query may be impractical. For example, in the domains mentioned above (heavenly bodies in the known universe, dwellings in the world, circuit items in a microprocessor), the number of comparison operations required to compare the location information of every item in the domain to the location criteria of the query may be in the millions, billions, or more.
As a specific example, consider the context of VLSI design automation. In VLSI design automation, there are many design data entities/items such as devices, parasitic resistors, capacitors etc., to be stored and processed. As technology advances, more and more components are packed onto a single chip design. The number of data entities/items to be handled in a design is often in excess of hundreds millions or over a billion.
A design consists of data items, typically represented as physical objects, such as wire segments, vias, components, pins, etc. The locations of these objects are within the chip boundary. These physical objects sometimes can be abstracted as points. However, more often, objects must be represented with geometric size. In latter case, the data items are associated with intervals instead of points.
CAD (Computer Aided Design) tools often need to do several kinds of operations on data items, such as adding, deleting, modifying, and querying. One of the typical queries performed by a CAD tool is a spatial query to find out all items whose physical coordinates are within the query's location criteria (i.e. the “query window”).
Over the years, a number of algorithms and data structures have been designed to speed up spatial queries. Such algorithms range from space-driven algorithms to data-driven algorithms, or hybrid space/data-driven algorithms. Space-driven algorithms tend to sub-divide space either into contiguous grids (simple grid indexing) or recursively divide space into a hierarchy of grids like KD-Tree, Quad-Tree, Octree. Space-driven algorithms are typically data agnostic. Thus, an index is made first, and data is added subsequently. In data-driven algorithms, such as R-Tree, or its variations like R*-Tree, the index dynamically changes as data is added or removed.
There are numerous variations on top of these traditional algorithms that aim to achieve better clustering, minimize physical memory, minimize query time or some other real world constraints, such as development effort. Unfortunately, because data in many scenarios is not randomly distributed, the efficacy of existing approaches is highly dependent on the domains to which the approaches are applied. In some unique problem domains, data tends to form a pattern, where a generalized or even a specialized algorithm from another domain, although devised cleverly, may perform miserably.
In terms of implementation, the traditional algorithms and their data structures require extra pointers to organize data for spatial queries. Each interval data item requires 4-8 pointers for 2-D data. These pointers are extra overhead for each data item. On a 64-bit operating system, an extra 32-64 bytes pointer memory will be needed for each item. The ratio of pointer overhead over useful data memory size gets worse when the data memory size of an item is small, which is very common and typical for cases where number of items in database is huge, for example, more than 10 million. The approaches described hereafter can substantially eliminate such memory overhead for database with large number of elements.
Another advantage of the approaches described hereafter is that database/data structure construction time is linear, or O(n) where n is the number of data items, which is much faster than traditional spatial query algorithms whose database construction time is O(n*ln n).
In theory, the query time of the approaches described hereafter is similar to the query time of traditional spatial query algorithms. In practice, however, when a query window size is significant, query time of the approaches described hereafter run faster than traditional algorithms. The improved query time speed is obtained, at least in part, because data items whose spatial locations are closer are naturally put to adjacent memory addresses in algorithm implementation. Therefore, the approaches described hereafter minimize memory paging, and minimize disk seeking time.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.