The present invention relates to indexing and managing data using fragmentation in relational databases.
A database is a collection of information. A relational database is a database that is perceived by its users as a collection of tables. Each table arranges items and attributes of the items in rows and columns, respectively. Each table row corresponds to an item (also referred to as a record or tuple), and each table column corresponds to an attribute of the item (referred to as a field or, more correctly, as an attribute type or field type).
Fragmentation is a technique used to increase database performance. A system supports data fragmentation if a given relation can be divided up into pieces or fragments. Data can be stored at the location where it is used most frequently. Moreover, two types of fragmentation can be used: horizontal fragmentation and vertical fragmentation, corresponding to relational operations of restriction and projection. The rules assigning a row to a fragment are defined by a database user or administrator and are part of a xe2x80x9cfragmentation schemexe2x80x9d. It is possible for a fragment of a given table to be empty if none of the rows of the table satisfy the fragmentation scheme""s assignment rules for that fragment. xe2x80x9cFragment eliminationxe2x80x9d is a process by which a database system identifies fragments from a table that cannot participate in the result of a query and removes those fragments from consideration in processing the query, thus improving the performance of the database system.
Also, fragments may be stored independently on separate disks or on separate nodes in a computer cluster or network architecture. Logically, all fragments may be scanned simultaneously, thereby increasing the overall rate at which the complete table can be read, subject to the limitations of the physical storage of the data.
As more and more businesses are run from mission-critical systems that store information on database systems, increasingly higher demands are being placed on these database systems to provide enterprise-wide decision support and to provide timely on-line access to critical business information through database queries. Accordingly, the performance of such systems needs to be continually enhanced. One way to enhance the performance of database systems is to improve fragment-related operations.
A relational database system manages data fragments in a database by converting a query or fragmentation expression to an intermediate range representation; mapping the intermediate range representation to an integer range representation; building an index tree (SKD tree) data structure to represent a search space associated with the data fragments; and using the index tree data structure to locate a desired data fragment.
Implementations of the system may include one or more of the following. The integer range representation can be independent of Structured Query Language (SQL) data types. The tree data structure can update the data fragment. The tree data structure can insert data into a data fragment. The tree data structure can select data in a data set. The tree data structure can be used during internal database operation. The data can be selected using an SQL select statement. The data tree structure can be used to locate the data fragment. The data set can partition into even segments to balance the tree data structure which can be used to populate the tree data structure. The tree data structure can also map all data types into an integer space. Collecting data points can be used in one or more fragmentation expressions; and the data points stored in a multi-dimensional array. The first index can also be used into the array as an index point for a NULL value and also an upper bound of the array can be used as an index point of positive infinity. The intermediate range representation is sorted. The mapping step also may include using a binary search in convert the intermediate range representation into the integer range representation. The index tree data structure represents a multi-dimensional search space. The index tree data structure can be a binary tree, and can be searched with an O(log(N)) search complexity. The data fragments can overlap.
In another aspect, a system contains means for converting a query expression to an intermediate range representation and means for mapping the intermediate range representation to an integer range representation. The system also contains means for building an index tree data structure to represent a search space associated with the data fragments and means for using the index tree data structure to locate a desired data fragment.
Among the advantages of the invention are one or more of the following. The invention provides high performance for managing data fragments in a database. The invention can manage fragmentation schema of arbitrary number of columns. This property allows the invention to manage large databases where the number of columns used for fragmentation can become quite large.
During the construction of the tree, an integer array is used to represent the range structure associated with the fragments. When the tree is used to locate fragments, the index of the array is used to do the search. This is both simple and efficient, because integer comparisons are computationally xe2x80x9ccheaperxe2x80x9d than SQL type comparisons. In addition, the modeling and mapping from SQL data-type to integer contribute to the simplicity and efficiency of the invention in performing operations with data fragmentation. Overlapping fragments are pruned from the tree, thus improving search performance. Moreover, the height of the tree generated by the invention is minimized. Thus, searching the tree will be fast and efficient. This will make insertion into the database and query optimization (eliminating unnecessary fragments) operations more efficient. Also, the system has a low data storage requirement. Due to these advantages, the invention can efficiently manage data fragments in a database.
Other features and advantages of the invention will become apparent from the following description and from the claims.