1. Field of the Invention
The present invention relates to information retrieval in a data processing system. The present invention further relates to a method for searching a document database such as the Internet and ranking the results obtained from such a search.
2. Description of Related Art
A computer""s logic is both its strength and its weakness; it can only perform what it is told to do. If an alarm clock is set to go off at 6:00 PM, it will go off at exactly that time, even if it was obviously meant to go off at 6:00 AM. People in the real world can solve problems and make decisions relatively easy, but even the simplest decisions are often too difficult to be handled by computer. Fuzzy logic query processing helps to bridge the gap.
Databases are strategic tools because they support business processes. In order for a database to be useful, data must be compiled into information using tools such as queries. Queries allow a user to specify what data to retrieve from a database, and in what form. Fuzzy querying provides a way to retrieve data that was intended to be retrieved, without requiring exact parameters to be defined.
Non-fuzzy query processing relies on Boolean logic, which limits results to true or false (1 or 0). Fuzzy query processing is a superset of Boolean logic that can handle partial truths. Instead of a search results being limited to return a value of true or false, the query returns values as x% true or x% a member of a subset.
Fuzzy queries rely on the use of fuzzy quantifiers. Dr. Lotfi A. Zadeh, the founder of fuzzy logic theory, defined two kinds of quantifiers: absolute and relative. An absolute quantifier can be represented as fuzzy subsets of the non-negative numbers and use words such as at least three or about five. Relative qualifiers are represented as fuzzy subsets of the unit interval and use words such as most, at least half, or almost all.
Fuzzy queries do not take the place of the more structured queries, but expand the alternatives available. Boolean systems use selection and then ordering as a mechanism, where a fuzzy system relies on a single mechanism of overall membership degree. A fuzzy system allows for compromise between the various criteria, where a Boolean system can produce a subset of previously selected elements. There are times when Boolean logic is too rigid to be meaningful to a user. A fuzzy query allows a user to find elements that satisfy a criterion and ranks the results.
FIG. 1 illustrates the typical flow of a fuzzy database query. After identifying the need for a report 100, the user queries a database 110. The database returns a record, which is matched against predefined criteria 120 to determine the degree to which a match has occurred. The degree is then compared to a threshold value 125 to determine whether the record satisfies the users query 130 or whether the record should be discarded 135. In general, a fuzzy database query differs from a non-fuzzy query by adding steps to match the data to predefined criteria and compare the value to a threshold specified in the query.
An object can be a member of multiple sets with a different degree of membership. The degree of membership is a scale from zero to one. Complete membership has a value of one, and no membership has a value of zero. When running a fuzzy query in a control system, the output is calculated based on the value of membership a given input has in the configured fuzzy sets. Each combination of sets is configured to have a specified output. The output is based on the weighted sum of the amount of membership in each set. The fuzzy models may be used in conjunction with probabilistic models to find a solution.
FIG. 2 shows the three transformations of the system inputs 200 to outputs 205 in a fuzzy system. The process of xe2x80x9cfuzzificationxe2x80x9d 210 is a methodology to generalize any specific theory from a precise form to continuous form. It decomposes a system input or output into one or more fuzzy sets. After the decomposition into fuzzy sets, fuzzy rule association 215 applies a set of rules to a combination of inputs. The rules determine the action and relate the variable into a numeric value. Once the numeric value is determined, de-fuzzification 220 converts the fuzzy result into an exact output value.
For example, telling a driving student to apply the brakes 74 feet from the crosswalk is too precise to be followed. Vague wording like xe2x80x9capply the brakes soonxe2x80x9d, however, can be interpreted and acted upon. The instruction is received in a fuzzy form, the person associates the message using past experiences, then defuzzifies the message in order to actually apply the brakes at the appropriate time. Fuzzy queries expand query capabilities by allowing for ambiguity and partial membership.
The invention relates to database searching and the ranking of a set of numerical data according to a set of user specified preferences, including target range, fuzziness and bias.
In many database query applications, data records are returned when certain field data falls into a user specified target range. The introduction of fuzziness in the present invention extends the returned data set by including records that are xe2x80x9cclosexe2x80x9d to the target range. The addition of a bias also increases the usefulness of the database query by providing a means to rank the results of the query in a specified order.
The present invention adapts the Lorentzian function to include variables for fuzziness and bias in order to calculate fuzzy scores, which are used to rank the results of database searches. In one embodiment, only a single input target range is used in the database query. In another embodiment, however, multiple query fields are used. According to another embodiment, when multiple query fields are used fuzzy scores are calculated for each query field in each record. The fuzzy scores of each query field in each record are then aggregated into a composite fuzzy score that is then used to rank the results of the database query.
Other features, advantages, and embodiments of the invention are set forth in part in the description that follows, and in part, will be obvious from this description, or may be learned from the practice of the invention.