1. Field of the Invention
The present invention relates to performing searches in a relational database. More specifically, the present invention relates to a method and an apparatus for facilitating an efficient wildcard search within a relational database, wherein the wildcard search includes a leading wildcard.
2. Related Art
A relational database system allows a user to search through and manipulate data within relational tables. One very common operation is to search through a target column of a table within the relational database system looking for fields that match a desired xe2x80x9csearch string.xe2x80x9d In performing this type of string search, it is often useful to be able to include xe2x80x9cwildcardxe2x80x9d characters in the search string.
A search string that contains a wildcard character matches a given string if the given string can be produced by substituting a sub-string for the wildcard character. For example, if the xe2x80x9c%xe2x80x9d symbol is the wildcard character, the search string SAT% matches the following strings {SAT, SATE, SATIN, SATURDAY}. Because of this flexibility, wildcard characters can be very useful in searching for strings that have a specific pattern.
Unfortunately, there is presently no easy way to search for occurrences of a search string that contains both a leading wildcard character and a trailing wildcard character. In existing relational database systems, a search with a constraint of the form {column_name like xe2x80x98%string%xe2x80x99} requires a search through all of the rows of the relational table. Relational database searches in this form are generally considered to be degenerate and poorly formed because existing relational database systems are not able to efficiently perform such searches.
Performing a full table search on a table containing a terabyte of data can consume almost all of the resources of a computer system for an extended period of time in order to find matching records. Not only must each record be fetched, but most if not all of the data in the target column of each record must be searched.
What is needed is a method and an apparatus that facilitates efficiently performing a wildcard search on a target column of a relational database using a search, string that contains both a leading wildcard character and a trailing wildcard character.
One embodiment of the present invention provides a system that facilitates performing an efficient wildcard search on a field within a relational database table. The system operates by receiving a query with a search string that includes a wildcard and a sub-string. In order to process this query, the system identifies rows in a table in which a target column contains a string that matches search string. This is accomplished by looking up rows in the table that match the search string by looking up the sub-string in a sub-string index for the target column, wherein this sub-string index allows a given string in the target column to be rapidly identified based upon a sub-string of the given string. Next, the system accesses the matching rows in order to process the query.
In one embodiment of the present invention, the wildcard is a leading wildcard that precedes the sub-string. In a variation on this embodiment, the search string additionally includes a trailing wildcard that follows the sub-string.
In one embodiment of the present invention, the system additionally creates the sub-string index for the target column prior to receiving the query.
In one embodiment of the present invention, the sub-string index includes a second table in the relational database. This second table includes a first column containing strings from the target column and second column containing corresponding sub-strings of the strings from the target column.
In one embodiment of the present invention, the sub-string index includes an internal index within the relational database that facilitates using a given sub-string to lookup at least one row in the table having a matching string in the target column.
In one embodiment of the present invention, the sub-string index is structured to lookup all sub-strings of the given string that are equal to or less than a maximum sub-string size.
In one embodiment of the present invention, the sub-string index is structured to lookup all sub-strings of the given string that include the last character of the given string.
One embodiment of the present invention provides a system that produces an index to facilitate an efficient wildcard search on a target column within a table in a relational database. The system operates by receiving a row for the table in the relational database. The system retrieves a string from the target column of the row, and then produces a plurality of sub-strings for the string. Next, the system uses the plurality of sub-strings to construct a sub-string index that facilitates looking up the row containing the string based upon one of the plurality of sub-strings.
In one embodiment of the present invention, the system additionally inserts the row into the table in the relational database.
In one embodiment of the present invention, the system additionally integrates the sub-string index into a global sub-string index for all strings in the target column in the table, wherein the global sub-string index facilitates looking up all strings in the target column that contain a given sub-string.
In one embodiment of the present invention, the global sub-string index includes a second table in the relational database, wherein the second table includes a first column containing strings from the target column and second column containing corresponding sub-strings of the strings from the target column.
In one embodiment of the present invention, the global sub-string index includes an internal index within the relational database that facilitates using the given sub-string to lookup at least one row in the table having a matching string in the target column which contains the given sub-string.