Many applications today analyze multidimensional data records. A multidimensional data record contains a number of data values, which are defined along a number of dimensions (also called attributes or keys) in a multidimensional space. Such records are typically stored in data files or databases.
A spatial data record is one type of multidimensional data record. Spatial data records typically describe the attributes (e.g., the position, size, shape, etc.) of geometric objects, such as points, lines, polygons, regions, surfaces, volumes, etc. Spatial records are used in many fields, including computer-aided design, computer graphics, data management systems, robotics, image processing, geographic information systems, pattern recognition, and computational geometry.
Effective data structures are needed to organize multidimensional and spatial data records efficiently, in order to optimize the time for querying these records. For instance, a sequential list of the multidimensional data records provides the simplest way of storing the records. However, the time needed for performing a query on such a list is intolerably high in most cases since each record in the list needs to be examined for each query.
Numerous multidimensional data structures have been proposed for organizing multidimensional and spatial data records. Hanan Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley Publishing, 1990, includes a review of many of these data structures.
Multidimensional data structures include hierarchical data structures. Hierarchical structures are based on the principle of recursive decomposition of the data space (i.e., the object space or the image space). In other words, these data structures are created by recursively dividing the data space, and storing the data records according to their location in the divided space.
Quadtrees and k-d trees are two types of hierarchical data structures. FIGS. 3-5 illustrate one example of a quadtree, while FIG. 6 illustrates one example of a k-d tree. These examples are described by reference to a use of spatial data records to represent interconnect lines on integrated-circuit layouts. Before describing the examples illustrated in FIGS. 3-6, interconnect lines are first described below.
A. Interconnect Lines
Electronic design automation (xe2x80x9cEDAxe2x80x9d) applications assist engineers in designing integrated circuits (xe2x80x9cIC""sxe2x80x9d). Specifically, these applications provide sets of computer-based tools for creating, editing, and analyzing IC design layouts. These layouts are formed by geometric shapes that represent layers of different materials and devices on IC""s. Spatial data records define the spatial attributes of many of these geometric shapes. For instance, spatial data records are used to define geometric shapes that represent conductive interconnect lines. Interconnect lines route signals on the IC""s. These lines are sometimes referred to as wire segments or segs.
EDA applications typically characterize interconnect lines as rectangles. FIG. 1 illustrates one such characterization. As shown in this figure, the rectangular line 105 can be represented by different sets of spatial attributes. For instance, the rectangle 105 can be represented by (1) the x- and y-coordinates of any of its opposing corners (such as the corners defined by the minimum and maximum x- and y-coordinates), or (2) the coordinates of one of its comers along with its width and height.
FIG. 2 illustrates a spatial data record 205 that represents the rectangle 105 by its minimum x-coordinate (XMIN), the minimum y-coordinate (YMIN), width along the x-coordinate (xcex94X), and height along the y-coordinate (xcex94Y). The data record 205 also identifies the layer that the interconnect line 105 traverses. This data record further designates the line as a white or gray line. A line is specified as a white line when that line is deemed critical for a particular operation. Alternatively, a line is specified as a gray line when it is not critical for a particular operation. However, a gray line might need to be taken into account for the analysis of a white line.
The six fields of the data record 205 can be viewed as six dimensions. These dimensions define a six-dimensional data space. The data record for each interconnect line can thus be viewed as a data point in the six-dimensional data space.
An interconnect line capacitively couples to other interconnect lines that are within a certain distance of it. This distance is typically the maximum distance of influence between two conductive interconnect lines. This distance is referred to as the halo distance. Capacitive coupling can exist between interconnect lines in the same plane (i.e., intra-layer coupling) or in different planes (i.e., inter-layer coupling).
Calculating such interconnect capacitances has become a critical step in the design of IC""s. The decreasing size of processing geometries have increased the concentration and proximity of the interconnect lines, which, in turn, has increased the parasitic effect of interconnect capacitances. Such parasitic capacitances increase signal delay and cause crosstalk, which prevent the IC""s from functioning properly.
Hence, in designing an IC, an engineer uses an EDA application to extract and analyze the interconnect capacitances that certain critical interconnect lines experience. An EDA application typically performs two steps to extract the capacitances experienced by a critical interconnect line. First, it identifies all interconnect lines within a certain distance of the critical interconnect line. Second, it calculates the capacitance between the critical interconnect line and each retrieved interconnect line.
To identify quickly the interconnect lines that are near a critical interconnect line, an EDA application needs to use data structures that efficiently organize the data relating to the interconnect line. Two commonly used data structures are quadtrees and k-d trees.
B. Ouadtrees.
Quadtrees are hierarchical tree data structures with the common property that they recursively decompose the data space into quadrants. One type of quadtree is a region quadtree, which successively subdivides the image space into equal-sized quadrants.
FIGS. 3-5 illustrate one manner for constructing a region quadtree. FIG. 3 present IC layout 305 that contains forty interconnect lines. FIG. 4 presents one manner of partitioning the IC layout 305 into equal-sized quadrants along the x- and y-axes. FIG. 5 illustrates the quadtree resulting from this subdivision.
In this example, each interconnect line is characterized as a rectangle that is defined by its minimum x- and y-coordinates and its width and height. The layer information for each rectangle is ignored as the IC layout is divided only along the x- and y-axes. Table 1 lists the four dimension values for each rectangular interconnect line.
As shown in this FIG. 4, the IC layout 305 is initially partitioned along the x- and y-axes into four equal-sized quadrants. The resulting quadrants are further subdivided into smaller quadrants. The subdivision process continues for all quadrants that would wholly contain at least two interconnect lines and would have at least one non-empty child node. It should be noted that some quadtrees also stop the subdivision process when a quadrant reaches a predetermined threshold size.
FIG. 5 illustrates a quadtree 500 that results from the subdivision that FIG. 4 illustrates. The quadtree 500 contains a root node 505 and a number of non-root nodes. The root node represents the entire IC layout, while the non-root nodes correspond to quadrants that divide the IC layout. Each node has four child nodes if that node""s region is subdivided into four quadrants. The nodes that correspond to those quadrants for which no further subdivision is necessary, are leaf nodes (i.e., are nodes with no child nodes).
As shown in FIG. 5, the root node has four child nodes, which are a northwest (xe2x80x9cNWxe2x80x9d) node 510, a northeast (xe2x80x9cNExe2x80x9d) node 515, a southwest (xe2x80x9cSWxe2x80x9d) node 520, and a southeast (xe2x80x9cSExe2x80x9d) node 525. These child nodes of the root node correspond to the four quadrants that initially divide the IC layout. Each of these child nodes, in turn, has four child nodes corresponding to the quadrants that subdivide its quadrant. However, at the third level of the tree, only some of the nodes (i.e., nodes 530, 535, 540, and 545) contain child nodes of their own, while at the fourth level of the tree, only one node (i.e., node 550) has child nodes.
As shown in FIG. 5, the quadtree 500 associates the rectangles in the IC layout 305 with both leaf and non-leaf nodes. Specifically, the quadtree associates each rectangle with the node that corresponds to the smallest quadrant that contains the rectangle in its entirety.
To identify all interconnect lines that might capacitively couple to a particular interconnect line, a range query can be performed on the quadtree 500 for all records within a halo region about the particular interconnect line. A range query is a search for all records in a data structure that fall within a particular range-query window.
Once the range-query window is determined, the range-query process starts at the root node and determines whether any rectangles"" records associated with that node fall within the range-query window. All records that fall within this query window are returned. The search process continues by traversing the tree, examining the records at each child node whose quadrant the query window intersects, and returning all records that fall within the search window.
One disadvantage of a quadtree is that its search runtime does not linearly increase with the number of records in the data space. Instead, the runtime increases log-linearly with this number. For instance, the run time for performing N range queries for N records in a quadtree is proportional to Nlog4N. So, as the number N of rectangles increases, the run time increases by a factor of       K    ⁡          (              1        +                                            log              4                        ⁢            K                                              log              4                        ⁢            N                              )        .
Equation (1) below explains this mathematically.                               Run          ⁢                      xe2x80x83                    ⁢          time          ⁢                      xe2x80x83                    ⁢          increase                =                                            KN              ⁢                              xe2x80x83                            ⁢                              log                4                            ⁢              KN                                      N              ⁢                              xe2x80x83                            ⁢                              log                4                            ⁢              N                                =                      K            ⁡                          (                              1                +                                                                            log                      4                                        ⁢                    K                                                                              log                      4                                        ⁢                    N                                                              )                                                          (        1        )            
Quadtrees also do not work well when the data size is not uniform. This is because the smaller records require smaller quadrants, while the larger records cross quadrant boundaries and therefore need to be stored in the higher levels of the quadtree. For instance, in FIG. 5, the root node contains ten rectangles (i.e., segs 1, 2, 5, 6, 10, 16, 19, 32, 33, 40) because these rectangles cross the boundary of the quadrants that initially divide the data space.
The query time suffers when there are a lot of records at the higher-level nodes of the quadtree. This is because, during each query, the search process will have to determine whether the records associated with each node in its traversal path fall within its range-query window. For instance, each time a range query is performed on the quadtree 500 of FIG. 5, the search process needs to examine whether the ten segs at the root node lie within the range query window. Such an examination is wasteful when these ten segs are far away from the range query window, as it would be the case when the range query window is at location 405 around seg 31, as shown in FIG. 4.
Quadtrees also do not perform well when the data distribution is highly non-uniform. In such situations, the quadtree has many more quadrants data records. Quadtrees are also memory intensive because all their levels have to be stored in memory to run queries. Otherwise, the query time might be even slower.
C. K-D Trees.
Another class of hierarchical tree data structures are k-d trees. There are several different types of k-d trees but, like quadtrees, all k-d trees are constructed by recursively decomposing the data space. Unlike quadtrees, k-d trees recursively decompose the data space (at each level of the tree) into two regions as opposed to four regions.
Hence, a k-d tree is a binary tree (i.e., a tree where each parent node has at most two child nodes). However, unlike a traditional binary tree that divides the data along one dimension (i.e., along one key), a k-d tree divides the data along k dimensions (i.e., k-keys). In other words, k-d trees use values along k-dimensions to determine branching as opposed to traditional binary trees that use values along one dimension to determine branching (i.e., to select between the left and right subtrees at each level). Thus, a k-d tree is a k-dimensional binary tree.
The search key at each level L of a k-d tree is called the discriminator at that level. Typically, the discriminator changes between each successive level of the tree. One commonly used approach is to define the discriminator at a level L by an L-mod-k operation. Hence, under this approach, the discriminator cycles through the k-dimensions as the tree expands from the root node.
FIG. 6 illustrates a simple k-d tree structure for the rectangles in the layout of FIG. 3. As shown in FIG. 6, the discriminator key cycles through four dimensions as the tree expands from the root node. These four dimensions are: the minimum x-coordinate (XMIN), the minimum y-coordinate (YMIN), the width along the x-coordinate (xcex94X), and the height along the y-coordinate (xcex94Y).
This k-d tree associates one data record with each node in the k-d tree. Each node""s discriminator key is then set as the value along that key of the data record stored at that node. For instance, seg 10 is stored at node 630. This node appears on the third level of the tree. Hence, its discriminator is along the xcex94X dimension. The discriminator value is seg 10""s xcex94X dimension value, which is 50.
The k-d tree 605 is constructed by inserting the records of the segs in the order that they appear in Table 1. In essence, for each record to be inserted, the tree is traversed based on the record""s XMIN, YMIN, xcex94X, xcex94Y values. At each node, a left branch is taken when the key value of the record is less than the discriminator value at the node, and a right branch is taken when the record""s key value is greater than or equal to the discriminator value at the node. When the bottom of the tree is reached (i.e., when a nil pointer is encountered), a node is inserted and the record is inserted into that node.
For instance, as shown in FIG. 6, the first record inserted in k-d tree 605 is seg 1""s record, as this record appears first in the Table 1. As the tree contains no other nodes at this point, seg 1""s data is inserted in the root node 610 of the tree. At this level of the tree, the discriminator is XMIN, and hence the XMIN value of seg 1 is used as the discriminator value at this level.
Seg 2""s record is the next record to be inserted into the tree. This record""s XMIN value is greater than the XMIN value for seg 1. Thus, seg 2 is added as the right child node 615 of the root node, since its XMIN is greater than the XMIN of the root node. Seg 3""s record is then inserted into the tree. This record""s XMIN value is less than the XMIN value for seg 1. Hence, seg 3 is added as the left child node of the root node. The child nodes 615 and 620 are both on the second level of the tree, where the discriminator is along the YMIN dimension. Thus, the discriminator values for nodes 615 and 620 respectively are the YMIN values of seg 2 and 3.
Seg 4 is the next record to be inserted into the tree. This record""s XMIN is greater than that of seg 1""s in the root node. Thus, a left branch is taken. Since seg 4""s YMIN is greater than seg 2""s YMIN value, the left pointer of node 615 is examined. Since this pointer is a NIL pointer, a new node 625 is created, seg 4""s data is inserted into this node, and the left pointer of node 615 is connected to the new node 625. Since the new node 625 is at the third level of the tree, the discriminator value for the node 625 is seg 4""s xcex94X value.
The record insertion process continues in a similar fashion until all the records in Table 1 are inserted in the k-d tree. Under this process, the shape of the resulting k-d tree depends on the order in which the records are inserted into it. Hence, this approach typically results in an unbalanced k-d tree. Numerous techniques have been proposed for constructing balanced k-d trees. Hanan Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley Publishing, 1990, discloses several of these techniques.
K-d trees alleviate many of the deficiencies of quadtrees. For instance, at each node of a k-d tree, only one key needs to be compared to determine which branch to take. K-d trees also function better than quadtrees when the data distribution is highly non-uniform.
On the other hand, like quadtrees, k-d trees are memory intensive because all their levels have to be stored in memory to run queries, in order to minimize their query times. Also, the time for either constructing a k-d tree or querying all its records increases log-linearly with the number of records in the data space as opposed to linearly increasing with this number. In particular, the run time for constructing a k-d tree with N records, or for performing N queries for the N records, is proportional to Nlog2N. So, as the number N of records increases, the construction and query run times increase by a factor of       K    ⁡          (              1        +                                            log              2                        ⁢            K                                              log              2                        ⁢            N                              )        .
Equation (3) below mathematically explains this increase in runtime.                               Run          ⁢                      xe2x80x83                    ⁢          time          ⁢                      xe2x80x83                    ⁢          increase                =                                            KN              ⁢                              xe2x80x83                            ⁢                              log                2                            ⁢              KN                                      N              ⁢                              xe2x80x83                            ⁢                              log                2                            ⁢              N                                =                      K            ⁡                          (                              1                +                                                                            log                      2                                        ⁢                    K                                                                              log                      2                                        ⁢                    N                                                              )                                                          (        2        )            
Therefore, there is a need in the art for a data structure that efficiently organizes multidimensional data in memory, so that the time for querying all the data in this data structure only linearly increases with the number of data items. Ideally, this data structure should take a minimal amount of system memory for each query operation.
The invention is directed towards method and apparatus for representing multidimensional data. Some embodiments of the invention provide a two-layered data structure to store multidimensional data tuples that are defined in a multidimensional data space. These embodiments initially divide the multidimensional data space into a number of data regions, and create a data structure to represent this division. For each data region, these embodiments then create a hierarchical data structure to store the data tuples within each region.
In some of these embodiments, the multidimensional data tuples are spatial data tuples that represent spatial or geometric objects, such as points, lines, polygons, regions, surfaces, volumes, etc. For instance, some embodiments use the two-layered data structure of the invention to store data relating to geometric objects (such as rectangles) that represent interconnect lines of an IC in an IC design layout. In this document, the phrase xe2x80x9cspatial objectxe2x80x9d or xe2x80x9cgeometric objectxe2x80x9d does not necessarily refer to an instantiation of a class in an object-oriented program, even though spatial or geometric objects are represented in such a fashion (i.e., are represented as data objects) in some embodiments of the invention.