1. Field of the Invention
This invention relates generally to computer-based evaluation of data and, more particularly, to incorporation of weights into rules for obtaining data combinational scores.
2. Description of the Related Art
There are many situations in which an organized collection of data is examined and certain combinational rules are applied to determine if one or more data items are appropriate for retrieval. For example, a database management system receives a user query that specifies data values and matches columns in data tables to retrieve the data rows that match the specified data values. A search engine for the Internet "world wide web" searches web page data to return those web pages that best match a user search query. In other situations, data that represents scores must be combined according to scoring rules to determine relative ranking of competitors.
In general, a relational database contains data that is organized into tables having tuples whose elements specify various attributes of the information in the database. That is, a database may contain data tables whose rows represent individual data records and whose columns represent data characteristics. The columns may represent different attributes or may represent the same attribute as perceived by different evaluators. For example, a financial database might contain tuples of company data in which each row of data represents an individual employee and the columns represent attributes such as employee name, address, job description, length of service, and the like. A multimedia database might contain tuples that represent characteristics of the scenes in a multimedia presentation, wherein each row of the table contains columns that represent attributes of the scenes in the presentation, such as relative color intensity (red, green, blue), image contrast, sound level, brightness, and the like. An Internet web page database may contain English language words referenced to the web pages that contain them. A scoring database might contain competitive diving or gymnastics scores, in which each row represents a different competitor and each column is a score for that competitor's performance as given by a different judge.
To determine which data should be returned to a user, it is often desirable to combine the columnar attribute data or scores into a single, combined score. This occurs, for example, where an overall score for a diving or gymnastics competitor must be computed, or where several multimedia scene attributes must be assessed and a single scene retrieved that most closely matches all the query scene attributes. An Internet search engine might rank web pages according to the number of search terms they contain and present them for retrieval. Thus, such score or attribute combining is a typical process in many types of query processing.
In the case of database management systems, users submit queries that cause the system to examine the database tuples and attempt to retrieve one or more tuples that satisfy a query. Many such queries deal with Boolean values, because most tuples either exactly match a query or do not. That is, many queries return a value of "true" or "false" for each tuple, such as a query of the form given by Query 1:
______________________________________ Query 1 ______________________________________ SELECT employee WHERE NAME = "John Smith", ______________________________________
which returns "true" for a tuple only if the employee name is John Smith. The database management system can search the name column of the data tables until it finds one that has a value of "John Smith", a response of "true" to the query. Multimedia databases are becoming more common, and often are subjected to more "fuzzy" searches in which the attribute queries are often not "true" or "false", but somewhere in between. Thus, in a multimedia search, it often is desirable to search over attributes having a continuum of values. For example, in searching a movie database for an ocean storm scene, a user will not likely want a Boolean search that specifies whether a scene is blue or not, because the user will want to exclude landscape-sky scenes. Rather, a user likely will want to specify a "score" or rating of the blueness of a scene.
In a multimedia search, a user also is more likely to want to search over multiple properties or attributes. For example, searching for an ocean storm scene might lead a user to be interested in relatively "blue" scenes that have a relatively low level of brightness and contrast, and perhaps a high volume (sound) level. In such a circumstance, there is likely to be a score giving the blueness of each scene, a different score giving the brightness, and a different score giving the loudness of sound. These scores must be combined into a single score that determines which scenes, if any, will be returned to the user in satisfaction of the query.
As another example, in the case of competitive swimming or gymnastics, scores from several judges are typically combined so that the highest score and lowest score are eliminated and the remaining scores are averaged to produce a single score for the competitor. A database management system for implementing such combinational rules would need to automatically retrieve such scores and perform the necessary elimination and averaging calculations.
For all of the above cases, one way to determine a single, combined score over several attributes (or judge's scores) is to use fuzzy logic. Systems employing fuzzy logic generally perform well if relatively simplistic combining schemes are used. For example, a fuzzy logic selection system often employs a scoring process that assigns a single score to each row of a data table by choosing the minimum column value, thereby providing a relative attribute score (or competitor's score) over the queried properties (or competition judges). Alternatively, a fuzzy logic system might choose the maximum column score over queried properties.
It would be advantageous if a query or other data combinational rule could be of a more complicated nature. For example, a user might want to differentially value the column attributes and compute a combined score. In the case of the multimedia database example, a user might want to give greater weight to the color hue score of a scene as compared to the sound level score of a scene. An Internet search engine user might want to assign selected search words or phrases to have greater significance than others. In addition, the number of attributes or search terms over which a database is searched may change as the user refines the search and gets closer to desired results. Many systems cannot accommodate such attribute weighting or can provide it only on a very limited basis, such as when the base combinational scoring rule is to take the average of the column scores.
For example, suppose that a combinational rule for computing a score or responding to a query is to combine the values of a data row (or combine the scores of a single competitor) by taking the average value of the scores. This can be represented by the following Combination 1 rule:
______________________________________ Combination 1 - Average Value ______________________________________ combined score = (x.sub.1 + x.sub.2) / 2, ______________________________________
which represents calculating the average of the scores x.sub.1 and x.sub.2, and returning the calculated value as the combined score. Thus, for any two scenes ranked by color (red-green) and brightness, the average of the color score and brightness score for the two scenes is determined (raw scores are added and the sum divided by two) and the scene with the higher average would be returned in satisfaction of the query. In this way, the scene with the highest average between blue score and brightness would be returned in satisfaction of the query.
Next, consider if a multimedia user wants to differentially value the column attributes and compute a combined score. Assume that color is more important than brightness, so that color is twice as important to the user as brightness. Using the same combinational rule to take the average of the attributes, computing the weighted score is relatively simple. For example, if x.sub.1 represents color score and x.sub.2 represents brightness score, then the returned combinational score should be given by Combination 2 as follows:
______________________________________ Combination 2 - Weighted Average ______________________________________ combined score = (.theta..sub.1 x.sub.1 + .theta..sub.2 ______________________________________ x.sub.2),
where .theta..sub.1 =0.67 and .theta..sub.2 =0.33, and the combined score represents the weighted average of these two products. The base attribute values x.sub.1 and x.sub.2 of the data tuples are multiplied by the respective weights .theta..sub.1, .theta..sub.2 and the sums are computed. Thus, the new combined score given by Combination 2 again provides the average combined value. Suppose, however, that the combinational rule is to return the base attribute value with the minimum (or maximum) magnitude, a methodology that is common in fuzzy logic, and the user wants to differentially value one attribute as more important than the other. Conventionally, the combinational rule to use in determining a combined, weighted score from among the base data tuples is not known.
For example, with two attributes r and s, suppose the base combinational rule or function is to retrieve the tuple with the attribute having the minimum score, as represented by Combination 3:
______________________________________ Combination 3 - Min ______________________________________ combined score = min(r, s), ______________________________________
whose result is rather straightforward for raw base scores. If a user wants to weight one of the variables or attributes more greatly than the other, it isn't clear how the new combined score would be computed. For example, a user might estimate that the r attribute is twice as important as the s attribute. It isn't clear how to calculate the new minimum combination score given the new weighting. Thus, although certain combined scoring rules are known for some base functions, such as the averaging function described above for Combination 2, it is not conventionally possible to incorporate weighting to all data base scoring functions so as to obtain the correct combined scoring function.
From the discussion above, it should be apparent that there is a need for a database processing method and system that permits weighting to be applied to combinational rules for evaluating a collection of data. The present invention fulfills this need.