The present embodiments relate to medical decision support systems. In particular, medical decision support systems operation is provided even with missing data. One problem in designing decision support classification is missing data. Some machine learning techniques do a better job in handling missing data than others.
In the case of clinical decision support for physicians, the issue of handling missing data has not been significant. Most machine learning algorithms have been based on interpretations of images. For example, classification is used in mammography computer assisted diagnosis (CAD) products to help identify potential lesions or calcifications. Since the only source of features is the image and the image is present, there is no issue of missing data. However, as decision support systems are extended to include heterogeneous sources of patient data, missing data may become an issue.
Not every patient will have all potential sources of information recorded in their patient record, either because a particular test was not done or results were not recorded. In an example of a decision support system to assist a physician in diagnosing breast cancer, the patient record may contain information about the woman, such as age and family history of cancer. In addition, the woman may or may not have screening mammograms from the past. Finally, some women may have undergone genetic tests, such as identification of the BRCA gene, to determine a propensity to breast cancer, but other may not have had the tests. Any one individual woman's patient record may only have a subset of information. Classifiers may not operate properly with missing inputs.
The missing values may be replaced with a substitute. In one approach, a global value replaces the missing data. The global value is an average, mean, median, or mode from a training set of data. Such a simple approach, however, could lead to incorrect conclusions.
In another approach, the most probable value replaces the missing data. The value may be estimated using inference, such as a Bayesian network. The value may be estimated from a distribution. Consider a set of features:x={x1x2 . . . xN}  (1)where xn is a feature, n=1 . . . N and N is the total number of possible features. Assuming only the first m features, m<N, are known for a particular patient, then one could estimate the joint probability as:P(xm, xm+1, . . . , xN|x1 . . . xN)   (2)There are several approaches to solve this joint probability from a set of training data. However, as the number of features grows large, the ability to solve this problem becomes increasingly difficult. If a learning-based approach is used, then the amount of data to learn this conditional joint probability becomes extremely large. In the case of the breast cancer example, the number of pieces of information (i.e., features), such as the history information, physical examination, current and prior mammogram features, genomics data, and other information, can grow extremely large, such as being on the order of hundreds or even thousands of pieces of information. The creation of a joint probability may be difficult to construct.