1. Related Art
The present invention relates to a method of indexing for use with entities stored in a database.
2. Related Art
It can readily be seen that when there are vast numbers of entities in a database, identifying entities in accordance with a query in respect of data in the database within a reasonable period of time is a non-trivial exercise. To ease the retrieval process, data in a database is generally indexed in some way, and queries are then performed on the index. The way in which the entities are indexed can be expected to have a significant bearing on the quality and speed of retrieval, and as information is increasingly being stored in databases, there is significant interest in finding improved ways of indexing data.
It is known to index location data based on place names. It is also known to retrieve a set of geographic coordinates from place names, and build an index based on topological information extracted from the coordinates (e.g. “GIPSY”: developed at U.C. Berkeley in conjunction with a joint NSF/NASA/ARPA (Wilensky et al., 1994) initiative). Furthermore, it is known to build an index based on the geographical coordinates themselves: database vendors such as Oracle™ have developed systems for storing and indexing geometrical data—e.g. Oracle spatial data cartridge, which allows a spatial querying to be carried out using an extended (non-standard) form of SQL. Other vendors, like MapInfo™, SpatialWare™, Innogistic™ and Informix™ have similar proprietary ways of dealing with spatial data. In particular, Innogistic™ have developed a product known as Cartology DSI, which stores geometrical vector data as blobs (binary large objects—which are not intrinsically recognisable by the underlying database). It also creates indexes outside of the database based on the well-known ‘quad tree’ idea. The index data is stored in binary-tree structures and is accessed by DCOM middleware services.
Both the Oracle™ and Innogistic™ systems make use of the quad-tree method, in which an entire area of a layer is divided and subdivided into a series of four nested squares. The entire area is assigned to one of four squares designated 0, 1, 2, and 3. Each of these squares is subdivided into four smaller squares. The area of square 1 becomes 10, 11, 12, and 13. Each of these is further subdivided, meaning, for example, that the subdivisions of square 11 would be assigned index values of 110, 111, 112, and 113. As a result, any location in the map can be referred to by a single index number. The disadvantage with this quad-tree method is that processing time is wasted if there are no points within the subdivided squares; if indexing is performed over a large area, this wasted processing time is non-trivial and costly.
According to a first aspect of the present invention there is provided a method of building an index to a plurality of entities, wherein each entity is represented by a point defined in a space. The method comprises the steps of:                i) identifying entities whose points are furthest apart;        ii) creating a first area, the extremities of which first area are given by the points representing the identified entities;        iii) assigning entities falling within the first area to a storage area corresponding to the first area;        iv) dividing the first area into a plurality of second areas;        v) for each of the plurality of second areas,                    a. linking each of the second areas to the first area, and            b. repeating steps (i)-(v) until the first area includes a single point; and                        vi) writing the storage areas corresponding to each of the first areas to the index.        
By identifying points that are spaced furthest apart, the indexing method essentially “shrinks” the indexing space so that it only indexes over areas that contain points. Thus the storage areas are correspondingly compact, which has advantages in terms of minimising use of storage space.
The second areas can be linked to first areas via a so-called “linked list” of points, which is an efficient linking mechanism known in the art.
Preferably the first area is divided into four second areas in step (iv) and step (v) is repeated recursively for each of the second areas. Advantageously the identifying step (i) comprises the steps of
calculating distances between entities in each of two dimensions; and
for each of the dimensions identifying which of the entities has the greatest distance between them, such that when the two dimensions are perpendicular to one another the first area created in step (ii) is a rectangle, defined by at least two points.
Conveniently the method further includes the steps of writing the or each entity within each of the first areas to a database, maintaining a register of number of entities written to the database, and for each of the first areas, writing the current register number to the storage area corresponding to the first area.