1. Technical Field
The present disclosure relates to image processing, and more particularly, to image processing using Random Forest Classifiers.
2. Discussion of Related Art
Recent machine learning research advances in the area of supervised learning popularized ensemble methods for classification and regression. An ensemble classifier may be used as a tool in medicine to diagnose disease by classifying objects within images of the body. For example, an ensemble classifier may be used to determine whether an abnormal mass is malignant or benign. Factors such as size, number, shape, and texture pattern may have an impact on whether the mass is considered malignant or benign. Computed tomography CT scans may be acquired after a contrast agent is administered to the patient to generate images of the mass. While two lesions may look similar in CT images, they may have originated from different pathologies that pose different risks for the patient.
CT and other radiology images also provide opportunities for content based image retrieval (CBIR). CBIR is also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR). CBIR is the application of computer vision techniques to search for digital images in large databases. While rich metadata about image semantics may be provided by radiologists, current CBIR systems do not fully exploit them.
Examples of ensemble classifiers include Boosting and Random Forests. An ensemble classifier may consist of a set of base classifiers (“experts”) that vote to predict unseen data. The commonality of computing the final classification or labeling of the unseen via a summation of “experts” justifies the name ensemble classifiers.
In supervised learning, a training set is used to learn a model M that generalizes well on unseen data when predicting a label y out of available labels L using F-dimensional features. For example, in ensemble methods, the model M is given by equation 1 as follows:
                              M          ⁡                      (            x            )                          =                  f          ⁡                      (                                          ∑                                  i                  =                  1                                T                            ⁢                                                g                  i                                ⁡                                  (                  x                  )                                                      )                                              [                  Equation          ⁢                                          ⁢          1                ]            with T base classifiers (“experts”) gi and function f casting the result obtained by the summation into a final classifier output.
In Boosting, a number T of weak learners selected during training corresponds to the available experts. A weak learner is a classifier, which is only slightly correlated with the true classification (e.g. it can label examples better than random guessing). One example of Boosting is AdaBoost. In AdaBoost, a classifier model obtained after the learning process is given by equation 2 as follows:
                              M          ⁡                      (            x            )                          =                  sign          ⁡                      (                                          ∑                                  i                  =                  1                                T                            ⁢                                                α                  i                                ⁢                                                      h                    i                                    ⁡                                      (                    x                    )                                                                        )                                              [                  Equation          ⁢                                          ⁢          2                ]            with both a weighting factor αi computed and the “weak” learner hi(x) selected from a possibly infinite dimensional set H of “weak” learners during training. By comparing the general model for ensemble methods given in equation 1 to the AdaBoost classifier model in equation 2, the cast function f=sign and the base classifier “expert” gi(x)=αihi(x), which is the product of the weighting factor with the “weak” classifier.
In Random Forest (RF), the available experts are T trees composing the forest. For example, T can be several hundred or even several thousand depending on what is being classified. In RF, each node of a tree i provides a probability pi(y|x) for yεL, which is obtained during training of the forest. To obtain the final classification rule, a voting of all trees i is performed and the label resulting in the maximum probability is assigned according to equation 3 as follows:
                              M          ⁡                      (            x            )                          =                  arg          ⁢                                          ⁢                                    max              ⁢                                                                                  y              ∈              L                                ⁢                                          ⁢                      1            T                    ⁢                      (                                          ∑                                  i                  =                  1                                T                            ⁢                                                                    p                    i                                    ⁡                                      (                                          y                      |                      x                                        )                                                  .                                                                        [                  Equation          ⁢                                          ⁢          3                ]            
However, when all the trees are used in this voting, excessive amounts of processing time may be expended. Using a lesser amount of the trees can reduce the processing time, but may also result in an erroneous classification.