Optimal extraction of data contained within an electromagnetic signal requires the ability to identify important components of the signal in spite of noise and limitations of the signal source and the instrumentation used to detect the signal. A key area in which optimized extraction and reconstruction of data is sought is the field of image analysis, where sources of noise and other factors can negatively impact the ability to efficiently extract data from the image, thus impairing the effectiveness of the imaging method for its intended use. Examples of areas in which image analysis can be problematic include astronomical observation and planetary exploration, where sources can be faint and atmospheric interference introduce noise and distortion, military and security surveillance, where light can be low and rapid movement of targets result in low contrast and blur, and medical imaging, which often suffers from low contrast, blur and distortion due to source and instrument limitations. Adding to the difficulty of image analysis is the large volume of data contained within a digitized image, since the value of any given data point often cannot be established until the entire image is processed.
Development of methods for automated analysis of digital images has received considerable attention over that past few decades, with one of the key areas of interest being the medical field. Applications include analysis of pathology images generated using visual, ultrasound, x-ray, positron emission, magnetic resonance and other imaging methods. As in the case of human-interpreted medical images, an automated image analyzer must be capable of recognizing and classifying blurred features within the images, which often requires discrimination of faint boundaries between areas differing by only a few gray levels or shades of color.
In recent years, machine-learning approaches for image analysis have been widely explored for recognizing patterns which, in turn, allow extraction of significant features within an image from a background of irrelevant detail. Learning machines comprise algorithms that may be trained to generalize using data with known outcomes. Trained learning machine algorithms may then be applied to predict the outcome in cases of unknown outcome. Machine-learning approaches, which include neural networks, hidden Markov models, belief networks and support vector machines, are ideally suited for domains characterized by the existence of large amounts of data, noisy patterns and the absence of general theories. Particular focus among such approaches has been on the application of artificial neural networks to biomedical image analysis, with results reported in the use of neural networks for analyzing visual images of cytology specimens and mammograms for the diagnosis of breast cancer, classification of retinal images of diabetics, karyotyping (visual analysis of chromosome images) for identifying genetic abnormalities, and tumor detection in ultrasound images, among others.
The majority of learning machines that have been applied to image analysis are neural networks trained using back-propagation, a gradient-based method in which errors in classification of training data are propagated backwards through the network to adjust the bias weights of the network elements until the mean squared error is minimized. A significant drawback of back-propagation neural networks is that the empirical risk function may have many local minimums, a case that can easily obscure the optimal solution from discovery. Standard optimization procedures employed by back-propagation neural networks may converge to a minimum, but the neural network method cannot guarantee that even a localized minimum is attained, much less the desired global minimum. The quality of the solution obtained from a neural network depends on many factors. In particular, the skill of the practitioner implementing the neural network determines the ultimate benefit, but even factors as seemingly benign as the random selection of initial weights can lead to poor results. Furthermore, the convergence of the gradient-based method used in neural network learning is inherently slow. A further drawback is that the sigmoid function has a scaling factor, which affects the quality of approximation. Possibly the largest limiting factor of neural networks as related to knowledge discovery is the “curse of dimensionality” associated with the disproportionate growth in required computational time and power for each additional feature or dimension in the training data.
The shortcomings of neural networks can be overcome by using another type of learning machine—the support vector machine. In general terms, a support vector machine maps input vectors into high dimensional feature space through a non-linear mapping function, chosen a priori. In this high dimensional feature space, an optimal separating hyperplane is constructed. The optimal hyperplane is then used to determine perform operations such as class separations, regression fit, or density estimation.
Within a support vector machine, the dimensionally of the feature space may be very high. For example, a fourth degree polynomial mapping function causes a 200 dimensional input space to be mapped into a 1.6 billion dimensional feature space. The kernel trick and the Vapnik-Chervonenkis (“VC”) dimension allow the support vector machine to avoid the “curse of dimensionality” that typically limits other methods and effectively derive generalizable answers from this very high dimensional feature space.
If the training vectors are separated by the optimal hyperplane (or generalized optimal hyperplane), the expected value of the probability of committing an error on a test example is bounded by the examples in the training set. This bound depends on neither the dimensionality of the feature space, the norm of the vector of coefficients, nor the bound of the number of the input vectors. Therefore, if the optimal hyperplane can be constructed from a small number of support vectors relative to the training set size, the generalization ability will be high, even in infinite dimensional space.
As such, support vector machines provide a desirable solution for the problem of analyzing a digital image from vast amounts of input data. However, the ability of a support vector machine to analyze a digitized image from a data set is limited in proportion to the information included within the training data set. Accordingly, there exists a need for a system and method for pre-processing data so as to augment the training data to maximize the computer analysis of an image by the support vector machine.