A large database application will typically support a feature whereby a user may enter a search key that only approximates the actual value for which the user is searching. Frequently, this is done using a "wildcard" character such as "*". The "*" character represents all possible character strings. Accordingly, a search for "jon*" will retrieve any values beginning with the string "jon" followed by any other string of characters. "Jon", "Jones", and "Jonathan" would all be retrieved as a result of using "jon*" as a search key.
The advantages of this search method are that it is flexible, easy to explain to users, and easy to implement. However, the method is somewhat limited. For example, if the values returned from a search do not contain the value sought, there is no obvious way to expand or modify the query to find "nearby" values. In addition, this method allows users to unwittingly or maliciously submit overly broad queries--queries so broad that an unacceptably large portion of the database is returned. For example, a user might type a query such as "a*", which would return all values starting with "a".
Now that databases are increasingly being queried over the Internet, it is becoming even more important to return search results to users in relatively small portions. Some service providers indeed provide search results in measured lengths. For example, only the first 20 hits of a search might be returned, and the user given the option of then requesting the next 20 hits. However, such service providers often initiate a completely new search when the next 20 hits are requested, and then select the second group of 20 hits from the results. This is because of the impracticality of maintaining state information at the server for the thousands of users that might be accessing a database at any given time. Hence a query that returns many sorted rows would be executed completely for each request to the server. This consumes a significant amount of server resources, for searching, sorting, and transmitting data to the client.
The prior art includes a database lookup scheme that addresses these problems to some degree. In accordance with this prior art scheme, two tables were maintained in addition to the actual database. A first table had a plurality of rows, each corresponding to an individual set of data records. Each row contained fields for three values: a first value, a last value, and an ID. The first and last values defined sets or pages of data, similarly to the index entries at the top of a telephone directory or dictionary. A data page corresponding to a particular table entry would include data values in the range between the first value and the last value, inclusive. This data page was placed in another table that was pre-generated.
Consider, as an example, a database of last names as follows:
Ableson Ahlquist Applegate Ashley Bachrnann Bailey Bangsberg Barker Barnes Baxter Beaver Becker Beckett Bergquist Beutel Bezanson Blackwell Boldman Borgman Bowman Bratsanos Braum Brown Burghardt Butler Cantwell Carlstrom Carolan Caudill Chen Cheng Chung Conway Crawford Cunneely Cutler Davidson Davies Davis Day De Donato DeBragga DeBroeck DeNike Dosch Douglas Doyle Drage Driscoll Dugan Dulfer Duncan Dunkelberger Dunn Durham Edwards Elder Ellis Evans Ewing Fang Fay Ferguson Ferris Ferry Fine Fitzgibbons Flippin Flowers Fort Francoeur Freeman Funaro Galasyn Garrity Gilroy Goodman Gordon Gower Grady Gui Guo Gwertzman Harbin Harkness Hart Hartill Haygood Heflin Helfrich Hersey Hogan Honeycutt Hornstein Hunter Huse Jackson Jewiss Jones Joyner Kearney Kelly Kelnhofer King Kinsella Kramer Kreider Labyak Lalonde Laurie Lee Leptich Lim Lindell Livingston Loggins Lorenzana Lowe Ludden Lutz Lyons Maccalman Maes Mah Mahoney Malugen Martin Mastan Maumas Maxey Mc Mahon McCann McCrory McDaniel McGuire Mercure Meyers Miao Milford Miller Moffitt Monberg Moncure Monroe Mooney Moore Morrison Murray Naroski Newton O'Connor O'Sullivan Oker Packham Perabo Peters Phan Pipkins Plamondon Prekeges Prets Quach Raines Ranch Rather Reid Reynolds Rice Richardson Rivera Roberts Robinson Rodrigues Rutledge Saunders Shamblin Simmons Simpson Slonsky Spahn St. Clair Stulz Sturm Sturms Teply Theivagt Thomson Tierney Timoney Titus Troupe Tucker Tutt Twyman Tyner Utzschneider Vaile Visintainer Walker Ward Watson Webb Webb Whitaker White Wilcher Williams Wilson Wolff Yenne Younkin
A corresponding table (referred to as a page definition table) might appear as shown in Table 1 below:
TABLE 1 ______________________________________ FIRST LAST ID ______________________________________ Ableson Baxter 0 Beaver Bowman 1 Bratsanos Chen 2 Cheng Day 3 De Donato Dugan 4 Dulfer Ewing 5 Fang Fort 6 Francoeur Grady 7 Gui Helfrich 8 Hersey Joyner 9 Kearney Laurie 10 Lee Lutz 11 Lyons Maxey 12 McMahon Miller 13 Moffit Newton 14 O'Conner Prekeges 15 Prets Rivers 16 Roberts Spahn 17 St. Clair Titus 18 Troupe Ward 19 Watson Younkin 20 ______________________________________
In accordance with this table, an information page having ID 0 (page 0) would include 10 names, beginning with "Ableson" and ending with "Baxter." Page 1 would have the next 10 names, beginning with "Beaver" and ending with "Bowman." The page definition table was created so that each page had an approximately equal number of names.
A second table was also maintained. This table had actual data from the database, arranged by ID number. In the example, such a table (referred to as a page data table) would have appeared as follows:
TABLE 2 ______________________________________ ID PAGEDATA ______________________________________ 0 Ableson . . . Baxter 1 Beaver . . . Bowman 2 Bratsanos . . . Chen 3 Cheng . . . Day 4 De Donato . . . Dugan 5 Dulfer . . . Ewing 6 Fang . . . Fort 7 Francoeur . . . Grady 8 Gui . . . Helfrich 9 Hersey . . . Joyner 10 Kearney . . . Laurie 11 Lee . . . Lutz 12 Lyons . . . Maxey 13 McMahon . . . Miller 14 Moffit . . . Newton 15 O'Conner . . . Prekeges 16 Prets . . . Rivers 17 Roberts . . . Spahn 18 St. Clair. . . Titus 19 Troupe . . . Ward 20 Watson . . . Younkin ______________________________________
The ID value of this table corresponded to the ID value of the page definition table (Table 1). The PAGEDATA values comprised the actual data that was to be returned to a user for particular page, taken from the database. In the example here, the data consists merely of the names themselves. In actual applications, other data might also have been included, such as telephone numbers or addresses.
To perform a search under this scheme, a user would enter the beginning portion of a name, or some string that the user believed approximated the name. Suppose, for example, that the user entered "fle" as a search string. This string would be used to query Table 1, to find the ID for a row spanning the search string. Using SQL format, the search might appear as follows: EQU select ID from page.sub.-- definition.sub.-- table where EQU FIRST.ltoreq.`fle` and LAST.gtoreq.`fle`
where "page.sub.-- definition.sub.-- table" refers to Table 1.
In this case, 6 would have been the appropriate ID, indicating that page 6 should be returned to the user. To find the appropriate data, a second query would have been performed against Table 2 using the ID number identified in the first search. An SQL query to accomplish this might have appeared as follows: EQU select PAGEDATA from page.sub.-- data.sub.-- table where ID==X
where where "page.sub.-- data.sub.-- table" refers to Table 2 and X is the ID value returned from the first query. The PAGEDATA value would have been returned to the user, along with the ID value. Using the ID value, the user would have been able to browse to previous and subsequent pages by submitting a query against Table 2 using a decremented or incremented ID.
There were several serious problems with this scheme. One problem occurred when a user submitted a search string falling between the first and last values of adjacent rows of the page definition table. A search string of "grant," for instance, would fall between rows 7 and 8 of the page definition table shown in Table 1. This would cause the search to fail, and no data would be returned to the user.
Another problem was that any insertion of new data required regenerating both of the tables. Furthermore, because of the structure of Table 2, PAGEDATA fields had to be defined before generation of the table. Changing the fields returned from a search required regeneration of the entire PAGEDATA table. A related problem was that such generation required very significant amounts of time when dealing with a database of any size. In a sample application, regeneration required over six hours of computing time.
Another problem with the prior art scheme was that the "closest" match to the user-specified search string (which was selected and highlighted for the user) would show up in search results at unpredictable offsets from the beginning of the returned data. To enhance usability, it would be more desirable to locate this data in a more predictable location, such as at the beginning of the search results or at approximately the middle of the search results.
The inventors have found an improvement over the prior art methods described above, allowing users to submit queries and to receive small groups of data in a sorted format, where the closest match is approximately centered within the returned results. The invention also allows the user to easily browse from one group of values to adjacent groups. Even with these desirable features, data can be added to the database without requiring the regeneration of any tables. Furthermore, searching is very efficient, and the server is not required to store state information for users.