1. Field of the Invention
The present invention relates to the field of machine classification, and, more particularly, to incorporating spatial knowledge for classification.
2. Description of the Related Art
A classifier is used to classify (i.e., separates) objects into two or more classes. An example of a classifier is as follows. Assume we have a set, A, of objects comprising two groups (i.e., classes) of the objects that we will call A+ and A−. As used herein, the term “object” refers to one or more elements in a population. The classifier, A, is a function, F, that takes every element in A and returns a label “+” or “−”, depending on what group the element is. That is, the classifier may be a FUNCTION F(A)→{−1,1}, where −1 is a numerical value representing A− and +1 is a numerical value representing A+. The classifiers A+ and A− may represent two separate populations. For example, A+ may represent structures in the lung (e.g., vessels, bronchi) and A− may represent nodules. Once the function, F, is trained from training data (i.e., data with known classifications), classifications of new and unseen data can be predicted using the function, F. For example, a classifier can be trained in 10,000 known objects for which we have readings from doctors. This is commonly referred to as a “ground truth.”Based on the training from the ground truth, the classifier can be used to automatically diagnose new and unseen cases.
A conventional classifier classifies the objects into classes based on an assumption that objects of the same class have comparable feature values, that is, belong to the same distribution in the feature space. In many applications, however, objects that belong to the same class have different feature values due to, for example, their spatial location. As used herein, the term “feature” refers to one or more attributes that describe an object belonging to a particular class. For example, a nodule can be described by a vector containing a number of attributes, such as size, diameter, sphericity, etc. The vector may contain attribute values, which are termed herein as “feature values.”
An existing solution to the above problem is to use a more complex classifier having a higher number of degrees of freedom. One way to create the more complex classifier is by mapping the data into a higher dimensional feature space using kernel mappings. That is, a function K (i.e., the kernel) takes the original data and maps it to a higher dimensional feature space (i.e., a feature space with more features) where the task of finding a classification function is easier to achieve.
For example, suppose the original data is a single case that analyzes a certain number of features. Thus, if the original data has 10 features, a vector for the original data is a 10 dimensional feature space. Suppose also that we have similar data for 1,000 other cases. We can create a similarity function that generates a value indicating how similar the original data is to each of the 1,000 other cases. Thus, a 1,000 dimensional feature space is created, mapped from the 10 dimensional feature space.
As used herein, the term “degrees of freedom” refers to the number of values in the final calculation of a statistic that are free to vary. Another way to produce the more complex classifier is to combine a number of simple classifiers, each trained using different sets of features, in series or in parallel. The results are merged to form an ensemble of classifiers.
These and other comparable approaches have the disadvantage of requiring additional training examples to compensate for the large number of degrees of freedom needed to obtain the same generalization performance. As used herein, the term “generalization performance” refers to the performance of a classifier on new and unseen data. A reduction of generalization performance may occur for many reasons. For example, in the learning (i.e., training) process for classification, similar to the regression case, there is a potential risk of overfitting the training data, resulting in poor predictive performance on new and unseen cases.