1. Field of the Invention
The present invention relates to methods and systems for 3D object detection. More particularly, the present invention relates to methods and systems for 3D object detection using learning.
2. Description of the Related Art
Detection of objects is important in medical and non-medical applications. For example, medical image analysis and diagnosis depends on the ability to detect anatomical structures. FIG. 1 shows examples of lung tumors in a 3D CT lung image.
Object detection in 3D and 4D (4D=3D+Time) data is difficult given the large amount of data associated with each image, increased noise level and computational complexity. In object detection, the classes to be discriminated are not defined by the variations of the different objects themselves, but rather by distinguishing between “images containing the object” and “images not containing the object.” Without restricting the domain of images for which the system must discriminate, the task of training an object detection system is time-consuming and difficult.
Machine learning, an area of artificial intelligence concerned with the development of techniques that allow computers to “learn” through the analysis of data sets, investigates the mechanisms by which knowledge is acquired through experience. The field of machine learning is concerned with both the analysis of data and the algorithmic complexity of computational implementations. Machine learning has a wide spectrum of applications including: stock market analysis, classifying DNA sequences, search engines, speech and handwriting recognition, medical diagnosis, and game playing. Common learning algorithm types include supervised learning or classification, unsupervised learning and reinforcement learning.
In machine learning theory, combining multiple classifiers is an effective technique for improving prediction accuracy. There are numerous general combining algorithms such as bagging, boosting, and error-correcting output codes. Boosting, which has its roots in PAC (probably approximately correct) learning, is a machine learning algorithm for performing supervised learning. Freund, Y. and Schapire, R. E. (1996), “Experiments with a new boosting algorithm,” In Machine Learning: Proceedings of the Thirteenth International Conference, Bari, Italy, pp. 148-156.
The idea of boosting is to design a series of training sets and use a combination of classifiers trained on these sets. The training sets are chosen sequentially, with the weights for each training example being modified based on the success of the classifier trained on the previous set. That is, greater weight is assigned to those training examples that were difficult to classify, i.e., those for which the misclassification rate was high, and lower weights to those that were easy to classify. Note that boosting can also be applied to learning methods that do not explicitly support weights. In that case, random sub-sampling can be applied to the learning data in the successive steps of the iterative boosting procedure. Freund and Schapire's AdaBoost algorithm is generally considered as a first step towards more practical boosting algorithms. Freund, Y. and Schapire, R. E. (1997), “A decision-theoretic generalization of online learning and an application to boosting,” In Journal of Computer and System Sciences, 55(1): 119-139.
As discussed in U.S. Pat. No. 6,546,379, boosting refers to a family of general methods that seek to improve the performance obtained from any given underlying method of building predictive models by applying the underlying methods more than once and then combining the resulting “weak” models into a single overall model that, although more complex than any of the “weak” models obtained from the underlying method, may make more accurate predictions. The term “weak”, as used in connection with boosting, is a technical term used in the art; a “weak” model has imperfect performance that one hopes to improve by somehow combining the weak model with other weak models built by the same underlying method, but from different training examples of the available training sets. U.S. Pat. No. 6,546,379, entitled “Cascade boosting of predictive models,” issued on Apr. 8, 2003 to Hong et al.
The task of classification is a key component in the fields of computer vision and machine learning. When an object is presented to a system for classification, the system selects specific features from the object and these features are passed to the classifier. The size of the search space grows exponentially with respect to the number of features; it is impossible to search exhaustively in the hypothesis space to find the optimal classifier.
Feature selection is an important part of many machine learning problems. Feature selection is used to improve the efficiency of learning algorithms by finding an optimal subset of features. FIG. 2 illustrates sample features for object detection with different computational costs. There has been a great deal of work on developing feature selection methods. Liu and Motoda provide an overview of the methods developed since the 1970s. Liu, H., and Motoda, H. (1998), Feature Selection for Knowledge Discovery and Data Mining, ISBN 0-7923-8198-X, Kluwer Academic Publishers.
The principal aim of designing a classifier is to accurately classify input. Classification accuracy depends on diverse factors, including sample size and the quality of training data. Duda, R., Hart, P. and Stork, D. (2001), Pattern Classification, 2nd ed., ISBN: 0471-05669-3, John Wiley & Sons. Research has shown two approaches, namely, the support vector machine approach and boosting, are effective for classification. Meir and Ratsch provide an overview of applications of boosting algorithms; the main ideas are illustrated on the problem of binary classification. Meir, R. and Ratsch, G (2003), “An introduction to boosting and leveraging,” In Advanced Lectures on Machine Learning, pp. 119-184, Springer.
Classification accuracy is a critical design consideration, as discussed above. Many real-time computer vision tasks also require a fast processing response, and this requires fast classification processes. For example, methods to detect faces in videos require scanning through a large number of possible candidate regions. Another more challenging example is the task of tumor detection in 3D CT lung images, which requires large 3D sub-volume datasets for scanning. Classification accuracy and fast classification processes are critical in developing 3D object detection methods and systems for medical and non-medical applications.