1. Field of the Invention
This invention generally relates to forecasting; and, more specifically, the invention relates to methods and systems to identify which predictors are important for making particular forecasts. The preferred embodiment of the invention may be used in any area where the collaborative filter (also called nearest neighbor) data mining technology is applied. Such areas include, but are not limited to, e-commerce, banking, manufacturing, securities trading, website personalization, mass marketing, communications, and medical diagnosis.
2. Prior art
The collaborative filter, also called the nearest neighbor model, is a mathematical model that is used to predict the value of a target variable given a set of input variables. For example, consider a scenario where a researcher has collected data regarding a population""s age, height, weight, and gender. This information is contained in the following table:
In this hypothetical example, suppose that the task is to take age, height, and weight information from a new table of data and use that information to determine the gender of the individual. Thus, suppose we are given a table as follows:
The task is to apply the nearest neighbor algorithm to determine the gender of the subjects 6-10 based on the data in Tables 1 and 2.
In a simple implementation of the nearest neighbor algorithm, we compute the Euclidean distance between every subject in our test set (Table 2) and every subject in the training set (Table 1). The gender associated with the nearest match in Table 1 is assigned to the subjects in the test set. A variation on this algorithm is to use the average value of the K nearest neighbors. In this case, we take K=1. Thus:
The lowest value in each row indicates the nearest match between the subject in the test set and the subjects in the training set. Thus, Subject 6 is closest to Subject 4. The gender for Subject 6 is thus predicted to be male. Now, we can complete Table 2:
A problem that arises in using this popular algorithm is that it is not known which variables led to the predictions made. For example, for subject 9, all we know at this point is that the nearest neighbor was subject 5. We do not know whether it was the age, height or weight of the subject that led to the conclusion that the subject""s gender is female. At this point, the best we can say is that the ensemble of predictors (age, height and weight) led to this conclusion.
An object of this invention is to provide a novel method and system to identify which predictors are important for making a forecast.
Another object of the present invention is to determine which predictor, if any, are driving a particular predictor in a collaborative filter.
These and other objectives are attained with a method and system for identifying parameters that are important in predicting a target variable. The method comprises the steps of compiling training data, said training data identifying, for each of a first set of subjects, values for each of a first set of parameters; and compiling test data, said test data identifying, for each of a second set of subjects, values for each of a second set of parameters, said first and second sets of parameters having at least a plurality of common parameters.
The method comprises the further steps of using the data in the training data, and using a nearest neighbor procedure, to identify, for each of the second set of subjects, a value for a target parameter; and processing the training data and the test data, according to a predefined procedure, to determine the relative importance of at least selected ones of the first group of parameters in predicting the values for the target parameter.
Further benefits and advantages of the invention will become apparent from a consideration of the following detailed description, given with reference to the accompanying drawings, which specify and show preferred embodiments of the invention.