This invention relates generally to data searching methods and systems including, for example, relational database searching methods and systems.
An increasingly vast amount of data is being stored in digital electronic formats. The value of this data is often dependent upon how effectively it can be retrieved to provide useful information. For this reason, a variety of database structures and database search engines have been developed over the years.
A large body of data has been stored in proprietary databases, which are accessed via custom-crafted software (xe2x80x9ccodexe2x80x9d). In such proprietary databases, there is a tight coupling between data organization (i.e. in the actual data structure) and in the access and query code. While the advantages of such proprietary databases include speed of access, compactness, and simplicity, they are typically not well suited for general-purpose data storage and retrieval applications. This is because with proprietary databases modifications to the data structures requires the rewriting of the access and query code, and because the queries tend to be fixed by being implemented in a programming language and then being compiled into the query code.
With the ever increasing amount of electronic data available and with the increasingly sophisticated demands for specialized information derived from such data, search engine techniques have become increasingly sophisticated and generalized. At the present time, the two main approaches for information retrieval include relational database searching engines, and text-based searching technologies as used, for example, for Internet searching.
Relational databases have been increasingly utilized over the past two decades in order to overcome the limitations of previous database architectures. One of the great strengths of relational databases is that they offer a flexible way to access the data along different dimensions and based on a set of criteria. The industry standard language, Structured Query Language (SQL), is used to define and execute such queries. SQL was initially designed by IBM Corporation and was later popularized by including it in relational database engines from such companies as IBM and Oracle Corporation, amongst others.
By using a relational database search engine such as SQL or the like, information can be obtained from the relational database based upon a multiplicity of factors. For example, an SQL inquiry can search a personnel database of a company for all employees that are making more than $20,000.00 a year and who have been employed with the company for less than twenty years.
Relational database search engines, such as the aforementioned SQL search language, suffer from the disadvantage of creating xe2x80x9cyes or noxe2x80x9d or xe2x80x9cblack and whitexe2x80x9d results. Using the previous example, a user searching for company employees making a salary of greater than $20,000.00 and less than twenty years with the company would miss all of the employees that were making exactly $20,000.00 or a few dollars less than $20,000.00 and those that worked exactly twenty years at the company or just over twenty years for the company, e.g. twenty years and one day. As such, there is no xe2x80x9cfuzzinessxe2x80x9d in such a relational database search request, and no indication of the importance of exactly fitting within the search criteria.
Sophisticated search engines using text-searching technologies approach the problem from a different direction. These text-searching technologies are used by Internet-based search engines such as Yahoo!, Alta Vista, etc. With the text based searching technologies, the search engine creates indexes based upon the words found in searched documents. When a user specifies one or more phrases to the search engine, the search engine checks these indexes and then uses some algorithm to produce a ranking of all the documents that contain the search words or phrases. The algorithm varies depending upon the search engine, but may be as simple as a count of matched words.
Text based search engines suffer from several common limitations. For one, they cannot perform trade-off analysis between various criteria, such as searching for information concerning cars which cost less than $30,000 and which have engines with more than 500 horsepower. For another, they are limited to text-based documents as their search domain. Finally, they do not provide any effective means for a user to specify how important a particular word is to that use.
The prior art therefore suffers from the inability for users or automated clients of a database search engine to specify preferences or xe2x80x9cweightsxe2x80x9d with respect to various search criteria, thereby introducing a degree of xe2x80x9cfuzzinessxe2x80x9d into the search request which provides a better retrieval of information from the database or other data source.
The present invention allows users and automated clients of a database search engine to specify the importance of various search criteria when making data searches. This permits a ranking of search results to present data in a more relevant fashion to the user or other client (e.g. an automated process).
A weighted preference information search system in accordance with the present invention includes a weighted preference generator and a weighted preference data search engine. The weighted preference generator develops weighted preference information including weights corresponding to search criteria. The weighted preference data search engine uses the weight of the preference data to search an information source and to provide an ordered result list based upon the weighted preference information.
The weighted preference data search system is often used to search a relational database. However, the search system can also be used to search a number of other data sources including flat databases, text-based databases, and data streams. In the case of data streams, the search system can search in real time or it can search the data stream after it has been buffered or stored in a computer readable media.
The weighted preference generator is preferably a client to the weighted preference data search engine. Alternatively, the weighted preference generator and weighted preference data search engine can be integrated processes. As a client, the weighted preference generator can include a user interface which allows a human user to input preferences into the generator. These preferences can include one or more of the selection of search criteria, the adjustment of weights with respect to the search criteria, and an indication of subjective ordering of at least one of the search criteria. Alternatively or additionally, the weighted preference generator can provide weighted preference information based upon at least one of default values, automated heuristics, user input, or other sources of input such as from devices such as machine sensors, temperature gauges, etc.
Preferably, the weighted preference data search system includes a data store and an algorithm processor. The data store stores data for the use of the algorithm processor such as client preferences, historical search data, and intermediate search results. The algorithm processor includes a data source reader, a normalizing alternative distance calculator, and an alternative scorer which creates a ranking for the alternatives based upon the normalized alternative distances and weighted preference information.
A method for weighted preference data searching in accordance with the present invention includes determining weighted preference information including a plurality of search criteria and a corresponding plurality of weights signifying the relative importance of the search criteria, and querying an information source and ranking the results based upon the weighted preference information. The data sources are often a database, e.g. a relational database. Alternatively, the data source can be a data stream which is arriving either in real-time or which has been buffered in a computer readable media.
In an embodiment of the present invention, in addition to providing a plurality of weights, a subjective ordering may be provided for at least one search criteria. As an example, the color of a car might be very important to a user and therefore given a relatively high numerical weight. However, there is a subjective aspect to color. For example, for one user the color red for a car might be very important, while for another user having a black car is very important. Subjective ordering permits criteria such as xe2x80x9ccolorxe2x80x9d to be associated with a subjective ordering of which colors are desired by the user and in what order.
Preferably, the method for determining weighted preference data includes determining whether or not there should be user input. If there is not user input, the system or xe2x80x9cclientxe2x80x9d provides at least one of default and automatically heuristically determined weights to the search engine. If user input is allowed, it is determined whether the user should be allowed to select criteria. If not, at least one of default and automatically heuristically determined criteria selections is made for the user. If the user is allowed to select criteria, user selection is input into the system. Additionally, it is determined whether the user should be able to adjust weights. If not, at least one of default and automatically heuristically determined weights is provided by the system. If the user is allowed to select weights, the weights are input into the system by the user. It is also preferably determined whether the user should be able to input subjective ordering into the system. If the user cannot specify subjective ordering, the system provides the orderings that are needed by the engine. If the user is able to input subjective ordering, the subjective ordering is input by the user into the system.
A method for weighted preference data searching includes reading information from a data source including a set of alternatives, each alternative containing values for a number of criteria. Next, the distance to an ideal value is measured and then normalized and stored. Then, for each criterion, the normalized distance data is multiplied by its corresponding weight and accumulated to obtain a score for the alternative. The alternatives are then ranked by their scores.
An alternative method for weighted preference source searching includes reading information from a data source including data for a plurality of alternatives. Next, for each criteria of each alternative the distance to an ideal value is measured and normalized distance data is created and then multiplied by its corresponding weight and accumulated to obtain a score for the alternative. The alternatives are then ranked based upon their scores.
An advantage of the present invention is that complex database queries can be made that have a degree of xe2x80x9cfuzzinessxe2x80x9d which are based upon user or other client input as to the importance or xe2x80x9cweightxe2x80x9d of particular search criteria. By providing this functionality, the search engine can provide results that are ranked by factoring a number of weighted search criteria to obtain results that best match the client""s specifications.
Another advantage of the present invention is its ability to enhance searches through an imbedded trade-off analysis capability, such that the client does not need to perform tradeoff analysis by a time consuming iterative approach through a repetitive set of queries.
These and other advantages of the present invention will become apparent to those skilled in the art upon a reading of the following detailed descriptions and a study of the various figures of the drawings.