This invention relates generally to methods and systems for developing search parameters to retrieve information from, for example, relational databases and other information sources.
An increasingly vast amount of data is being stored in digital electronic formats. The value of this data is often dependent upon how effectively it can be retrieved to provide useful information. For this reason, a variety of database structures and database search engines have been developed over the years.
A large body of data has been stored in proprietary databases, which are accessed via custom-crafted software (“code”). In such proprietary databases, there is a tight coupling between data organization (i.e. in the actual data structure) and in the access and query code. While the advantages of such proprietary databases include speed of access, compactness, and simplicity, they are typically not well suited for general-purpose data storage and retrieval applications. This is because with proprietary databases modifications to the data structures requires the rewriting of the access and query code, and because the queries tend to be fixed by being implemented in a programming language and then being compiled into the query code.
With the ever increasing amount of electronic data available and with the increasingly sophisticated demands for specialized information derived from such data, search engine techniques have become increasingly sophisticated and generalized. At the present time, the two main approaches for information retrieval include relational database searching engines, and text-based searching technologies as used, for example, for Internet searching.
Relational databases have been increasingly utilized over the past two decades in order to overcome the limitations of previous database architectures. One of the great strengths of relational databases is that they offer a flexible way to access the data along different dimensions and based on a set of criteria. The industry standard language, Structured Query Language (SQL), is used to define and execute such queries. SQL was initially designed by IBM Corporation and was later popularized by including in relational database engines from such companies as IBM and Oracle Corporation, amongst others.
By using a relational database search engine such as SQL or the like, information can be obtained from the relational database based upon a multiplicity of factors. For example, a SQL inquiry can search a personnel database of a company for all employees that are making more than $20,000 a year and who have been employed with the company for less than twenty years.
Relational database search engines, such as the aforementioned SQL search language, suffer from the disadvantage of creating “yes or no” or “black and white” results. Using the previous example, if a user is searching for company employees making a salary of greater than $20,000 and less than twenty years with the company, would miss all of the employees that were making exactly $20,000 or a few dollars less than $20,000 and those that worked exactly twenty years at the company or just over twenty years for the company, e.g. twenty years and one day. As such, there is no “fuzziness” in such a relational database search request, and no indication of the importance of exactly fitting within the search criteria.
Additionally, relational database search engines, such as the aforementioned SQL search language, suffer from the disadvantage of being unable to derive the original query parameters based on the results of a query. In other words, it is typically impossible to infer the original query's selection clauses based solely on the collection of records returned by a query.
Sophisticated search engines using text-searching technologies approach the problem from a different direction. These text-searching technologies are used by Internet-based search engines such as Yahoo!, Alta Vista, etc. With the text based searching technologies, the search engine creates indexes based upon the words found in searched documents. When a user specifies one or more phrases to the search engine, the search engine checks these indexes and then uses some algorithm to produce a ranking of all the documents that contain the search words or phrases. The algorithm varies depending upon the search engine, but may be as simple as a word count.
Text based search engines suffer from several limitations. For one, they cannot perform trade-off analysis between various criteria, such as searching for information concerning cars which cost less than $30,000 and which have engines with more than 500 horsepower. For another, they are limited to text-based documents as their search domain. Finally, they do not provide any effective means for a user to specify how important a particular word is to that use.
The prior art therefore suffers from the inability for user or automated clients of a database search engine to specify preferences or “weights” with respect to various search criteria, thereby introducing a degree of “fuzziness” into the search request which provides a better retrieval of information from the database or other data source.
Another drawback of the prior art is that it does not aid the user in refining his searches or search techniques if search results are deemed poor. If a user conducts a search and the results are not appropriate, the system provides poor, if any, hint or feedback of more appropriate search criteria to the user. That is, only slight modifications of search techniques or terms may be necessary to arrive with the desired results, but prior art systems are not helpful in suggesting how to refine or adjust the search parameters to make those slight modifications.
Another drawback of the prior art is that conventional search engines do not “infer” what the search criteria was from chosen results. Still further, in the prior art if a user is aware of pertinent items in a relational database, there is no way to ascertain what characteristics (or search criteria) of the items distinguish them from other items in the database.