This invention relates generally to data searching methods and systems including, for example, relational database searching methods and systems.
An increasingly vast amount of data is being stored in digital electronic formats. The value of this data is often dependent upon how effectively it can be retrieved to provide useful information. For this reason, a variety of database structures and database search engines have been developed over the years.
A large body of data has been stored in proprietary databases, which are accessed via custom-crafted software (“code”). In such proprietary databases, there is a tight coupling between data organization (i.e. in the actual data structure) and in the access and query code. While the advantages of such proprietary databases include speed of access, compactness, and simplicity, they are typically not well suited for general-purpose data storage and retrieval applications. This is because with proprietary databases modifications to the data structures requires the rewriting of the access and query code, and because the queries tend to be fixed by being implemented in a programming language and then being compiled into the query code.
With the ever increasing amount of electronic data available and with the increasingly sophisticated demands for specialized information derived from such data, search engine techniques have become increasingly sophisticated and generalized. At the present time, the two main approaches for information retrieval include relational database searching engines, and text-based searching technologies as used, for example, for Internet searching.
Relational databases have been increasingly utilized over the past two decades in order to overcome the limitations of previous database architectures. One of the great strengths of relational databases is that they offer a flexible way to access the data along different dimensions and based on a set of criteria. The industry standard language, Structured Query Language (SQL), is used to define and execute such queries. SQL was initially designed by IBM Corporation and was later popularized by including it in relational database engines from such companies as IBM and Oracle Corporation, amongst others.
By using a relational database search engine such as SQL or the like, information can be obtained from the relational database based upon a multiplicity of factors. For example, an SQL inquiry can search a personnel database of a company for all employees that are making more than $20,000.00 a year and who have been employed with the company for less than twenty years.
Relational database search engines, such as the aforementioned SQL search language, suffer from the disadvantage of creating “yes or no” or “black and white” results. Using the previous example, a user searching for company employees making a salary of greater than $20,000.00 and less than twenty years with the company would miss all of the employees that were making exactly $20,000.00 or a few dollars less than $20,000.00 and those that worked exactly twenty years at the company or just over twenty years for the company, e.g. twenty years and one day. As such, there is no “fuzziness” in such a relational database search request, and no indication of the importance of exactly fitting within the search criteria.
Sophisticated search engines using text-searching technologies approach the problem from a different direction. These text-searching technologies are used by Internet-based search engines such as Yahoo!, Alta Vista, etc. With the text based searching technologies, the search engine creates indexes based upon the words found in searched documents. When a user specifies one or more phrases to the search engine, the search engine checks these indexes and then uses some algorithm to produce a ranking of all the documents that contain the search words or phrases. The algorithm varies depending upon the search engine, but may be as simple as a count of matched words.
Text based search engines suffer from several common limitations. For one, they cannot perform trade-off analysis between various criteria, such as searching for information concerning cars which cost less than $30,000 and which have engines with more than 500 horsepower. For another, they are limited to text-based documents as their search domain.
Finally, they do not provide any effective means for a user to specify how important a particular word is to that use.
The prior art therefore suffers from the inability for users or automated clients of a database search engine to specify preferences or “weights” with respect to various search criteria, thereby introducing a degree of “fuzziness” into the search request which provides a better retrieval of information from the database or other data source.