In typical database systems, users store, update, and retrieve information by interacting with user applications (“clients”). The clients respond to the user's interaction by submitting commands to a database application (a database management system/a “database server”) responsible for maintaining the database. The database server responds to the commands by performing the specified actions on the database. To be correctly processed, the commands must comply with the database language that is supported by the database server. One popular database language is known as Structured Query Language (SQL).
One common configuration of a database is one made up of various tables with each table being formed of rows and columns of information. The information stored across one row in the table would make up one record and the fields of the record would be columns in the table. In other words, the table would contain rows of individual records and columns of record fields. Because one record may contain more than one field of information, the information of the field would make up the columns of the database table. Other database configurations are known in the art. Database management programs support multiple users thereby enabling each user to access the same table concurrently.
An index is commonly used by database management programs to provide quick and efficient associative access to a table's records. These indexes are commonly configured in a B−Tree structure which includes a root node with many levels of nodes branching from the root node. The information contained in these nodes may include pointers which point to the nodes at the next level of the tree or it may include pointers which point to one or more records stored in the database. These pointers include additional key record information which may reference the records stored in the database. The record keys are stored in an ordered form throughout the nodes at the various branches of the tree. For example, an index tree may exist for an alphabetic listing of employee names.
Various access methods may be used to retrieve data from a database. The access methods used to retrieve data may significantly affect the speed of the retrieval and the amount of resources consumed during the retrieval process. Many access methods use indices to increase the speed of the data retrieval process. Typical database management systems have built-in support for a few standard types of access methods, such as access methods that use B+Trees and Hash Tables, that may be used when the key values belong to standard sets of data types, such as numbers, strings, etc. This type of data is referred to as structured data.
In recent years, databases are being used to store different types of data, such as text, spatial, image, video, and audio data. For many of these complex data types, the standard indexing techniques and access methods cannot readily be applied. Text data or image data cannot be readily used in a B−tree index because B−tree's are based on equality conditions that can be computed against a “value.” Text data, such as a sentence, does not have a “value” that can be used in a B−tree that is being searched for individual words. This type of data is referred to as unstructured data as opposed to structured data. Unstructured data can be searched efficiently by using an inverted index such as Oracle text.
Thus, queries that include conditions for both unstructured data and structured data have not been efficiently processed. The results of both conditions would be combined based on the query operator and a final set of matched records was obtained.
The present invention provides a method and system for generating a database index that cures the above-referenced problems and others.