The invention relates generally to data access operations and, more particularly, to data access techniques using a tiered access structure.
Since their introduction, computers have increasingly been used as mechanisms for the storage and retrieval of information. Computers and computer systems (e.g., two or more computers coupled through a communication media) are now used extensively to store and manipulate large collections of data. One task associated with large data collections is the ability to provide fast and efficient access to each stored object. The term xe2x80x9cobject,xe2x80x9d as used herein, implies only a stored quantity. For example, a stored object may be a text file, an image file, a spreadsheet file, a table, a number, a text string, or any other quantum of information. In particular, the term xe2x80x9cobjectxe2x80x9d is not limited to that generally used in the field of object-oriented database design.
For relatively small data collections each stored object may be associated with a name or label. To view and/or manipulate a specific object, a user need only select the appropriate label from a list of similar labels and retrieve the associated data object. As the number of stored data objects increases, however, more sophisticated techniquesxe2x80x94generally referred to as indexingxe2x80x94need to be employed.
One indexing technique commonly used in large database management systems employs an index configured in a B-tree structure (a balanced m-way tree structure). B-trees provide a means to search, insert and delete objects from a collection of objects in logarithmic time. One drawback to a B-tree index is that the balanced nature of the B-tree must be maintained as index entries are added or removed (corresponding to the storage and deletion of data objects). The computational resources needed to maintain a B-tree structure (especially for large data collections) may be significant.
Another indexing technique that may be used when data object key values are of varying size is the trie. A trie is a tree structure in which the branching at any level is determined not by the entire key value but by only a portion of it. For efficient use, a trie must be kept to as few levels as possible. The computational resources needed to accomplish this (especially as more and more data objects are stored) may, like with the B-tree index, be significant.
Yet another indexing technique is known as hashing. Unlike tree indexing techniques that search for an object identifier (e.g., a key) via a sequence of comparison operations, hashing determines the address or location of an object""s identifier (along with the location of the data object itself) in a hash table by computing some arithmetic function ƒ( ) on the object""s identifier X. Design of a hash index or table is based in part on a knowledge or assumption of the number of entries to be stored in the hash table. In practice, the number of table entries is significantly smaller than the number of possible identifiers. This implies that as more identifiers are stored (corresponding to more stored data objects), the probability increases that a new identifier will map to a table entry that is already full. Such an event is known as a collision. Identifiers that result in a collision are processed in accordance with one of a variety of standard overflow techniques such as, for example, rehashing, open addressing (e.g., random, quadratic, and linear), and chaining. As more data objects are stored, the number of collisions typically increases and the ability of a hash index to quickly locate a specific identifier decreases. At some point, the performance of a hash index may be so degraded that it must be reconstituted to allow for an increased number of entries. The computational resources needed to rebuild a hash index may be significant.
Thus, it would be beneficial to provide indexing techniques that dynamically accommodate (with reduced computational effort over prior art techniques) arbitrarily large data collections.
In one embodiment the invention provides a memory for access by a program being executed by a programmable control device. The memory includes a data access structure stored in said memory, the data access structure including a first and a second index structure together forming a tiered index. The first structure including a plurality of entries, at least one of which indicates an entry in the second structure. The second structure also having a plurality of entries, the number of such entries being dynamically changeable.
In another embodiment, the invention provides a method for building a tiered index structure. The method includes building a first-level index structure having a predetermined number of entries, building a second-level index structure having a dynamic number of entries, and establishing a link between an entry in the first-level index structure and an entry in the second-level index structure. Methods in accordance with the invention may be stored in any media that is readable and executable by a computer system.