In any dataset (e.g., data table or query result) of a database (or other data store, for example an XML file), there is the possibility of having entries of a record (or recordset) with missing values. For example, collected data may be missing values because the value is unknown at the time the data is collected. Missing data affects the quality of the data in, for example, in analysis or research based on the data.
As an example, personal health record (PHR) databases play an important role in promotion of medical and disaster research, as well as providing analytic service for personal health care. For example, a PHR can provide personal health analysis according to historical data from the database. The historical data may support counselors and instructors in various types of health promotion facilities. Moreover, health maintenance cycle brought by wellness tour can be scheduled and reminded by PHR analysis. In addition, the data can be used to create a prediction model to recommend the best wellness program or a healthy daily menu. Generally, PHR data can be collected based on three main approaches: daily health records from personal health meters, custom records from wellness centers and statistical database from universities and research centers. However, missing values occur in the PHR databases there may be some difficulty in collecting complete data for all the people.
In conventional missing value imputation methods, a selected record with a missing value can be expressed as a linear combination of all the other similar records. In other words, these algorithms exploit local similarity structure in the dataset for missing value imputation. Typically, a subset of records that exhibits high correlation with the record containing the missing values is used to impute the missing value. Most methods also assume that the features of all the records are considered independent with each other, most of which have been applied in microarray data analysis.
However, in some databases (or datasets), some data features may be linearly co-related, and the data can be categorized based on the relationship. For example, PHR data may be categorized into two groups by their features. One group may be measured data, such as height, weight, blood test results; the other group may be generated and quantified from questionnaires, such as tiredness, appetite. Accordingly, the types of features may affect the linear combination differently. Conventional missing value imputation methods do not consider co-related data when imputing missing values in a database. Accordingly, there is a need to methods and systems to utilize co-related data when imputing missing values in a database.