1. Field of the Invention
The present invention relates to a method and article of manufacture for filling in missing values in an array, and more particularly, to filling in missing attribute values in a data set array of cases and attribute positions.
2. Discussion of the Related Art
As data set applications continue to increase in variety and use, methods to increase their efficiency and accuracy become necessary. Common examples of these data set applications presently in use include a variety of database applications including consumer databases, banking transaction databases, medical patient databases, and Internet information retrieval databases.
One problem which reduces accuracy and efficiency in data set application functionality results from missing values. It is known in the art of estimating values for missing variables to use statistical regression to estimate the missing values, then repeat the regression, and re-estimate the missing values from the subsequent regression. This is a parametric approach which handles continuous variables well, but results in extrapolation problems for other types of missing variables, and causes the estimates to diverge.
For these reasons, the accuracy of certain estimation applications is increased with the use of non-parametric approaches, such as the K-Nearest Neighbor (hereinafter "KNN") method to fill in missing values. This approach makes few statistical assumptions, reduces the possibility of extrapolation problems and allows the estimation of any kind of missing variable.