The present invention provides a medical imaging system that can process medical imaging data to present a disease prognosis based on limited clinical data.
Modern medical diagnostic tools such as magnetic resonance imaging (MRI) and positron emission tomography (PET) provide clinicians with a wealth of data that promises to greatly advance our ability to measure and predict disease progression. In one example, there is strong evidence that Alzheimer's disease manifest in such brain imaging data years before the onset of clinical or cognitive symptoms.
The amount of data output by diagnostic imaging tools such as MRI practically exceeds the analysis abilities of individual diagnosticians and accordingly specialized machine learning systems have been used with imaging systems to process this information. As is understood in the art, such machine learning systems provide circuitry that can “learn” to analyze data by a training process that uses a set of known examples that form a training set. For example, a training set for the detection of Alzheimer's disease may provide imaging information from a set of subjects with two sub-groups. The first sub-group corresponds to those subjects that are cognitively healthy whereas the other subgroup includes subjects who have been diagnosed with Alzheimer's disease. First, the training set is presented to the machine learning system which learns to make the correct prognosis of Alzheimer's disease or no Alzheimer's disease from the imaging data of the training set. Next, imaging data of a new patient, whose prognosis is unknown, is applied to the trained machine learning system to obtain a prognosis for that new patient.
A serious obstacle to the training of machine learning systems using clinical data is the mismatch between the number of dimensions of the clinical data (the total number of measurements obtained from all types of data collecting instruments, for example MRI and PET etc.) and the number of samples (for example, patients) in the clinical trial. Often, the number of patients in a clinical trial (and hence the number of samples in a potential training set) is relatively limited (for example, less than 1000) while the dimensions of data obtained from the imaging equipment can be in the many millions. In these cases, where the dimensionality of the data far exceeds the number of samples, the machine learning system will almost surely not be able to learn the underlying concept (the distinction between disease and no disease) due to the few number of samples to learn from.
This problem of unnecessary over fitting is fundamentally inevitable given the cost and difficulty of performing clinical studies with larger numbers of patients and the increasingly powerful medical imaging devices that provide increasing dimensions of measurement.