The invention relates to method and system for image classification.
Image classification, including object recognition and scene classification, remains to be a major challenge to the computer vision community. Perhaps one of the most significant developments in the last decade is the application of local features to image classification, including the introduction of “bag-of-visual-words” representation.
One conventional approach applies probabilistic generative models with the objective towards understanding the semantic content of images. Typically those models extend topic models on bag-of-word representation by further considering the spatial information of visual words.
Certain existing approaches apply vector quantization (VQ) coding on local image descriptors, for example SIFT features or SURF features, and then average pooling to obtain the so-called “bag-of-visual-words” representation, which is fed into a nonlinear classifier based on SVMs using Chi-square or intersection kernel.
A further extension is to incorporate the spatial information of local descriptors in an image, by partition images into regions in different locations and scales and compute region-based histograms, instead of computing the global histogram for the entire image. These region-based histograms are concatenated to form the feature vector for the image. Then nonlinear SVM is applied for classification. This approach is called “spatial pyramid matching kernel” (SPMK) method. SPMK is regarded the state-of-the-art method for image classification.
It is known that SVMs use pyramid matching kernels, biologically-inspired models, and KNN methods. Over the past years, the nonlinear SVM method using spatial pyramid matching (SPM) kernels seems to be dominant among the top performers in various image classification benchmarks, including Caltech-101, PASCAL, and TRECVID. The recent improvements were often achieved by combining different types of local descriptors, without any fundamental change of the underlying classification method. In addition to the demand for more accurate classifiers, one has to develop more practical methods. Nonlinear SVMs scale at least quadratically to the size of training data, which makes it nontrivial to handle large-scale training data. It is thus necessary to design algorithms that are computationally more efficient.