Fine-grained visual categorization (FGVC) is a recent proposed in computer vision research. Compared with traditional image classification, FGVC assumes that classes are very similar to each other, where even human vision cannot always classify them correctly at the first glance. This problem arises in many real-world classification applications, especially for species recognition between, e.g., cats and dogs, birds, aircraft, flowers, etc.
Most existing approaches for FGVC rely on extracting specific features rather than traditional image features from given fine-grained classes with domain knowledge. All of these methods achieve good performance in their domains with one-v s-all linear support vector machines (SVMs) as the final classifier. Feature learning has attracted so much attention, in fact, that few have attempted to develop FGVC methodologies rather than directly applying linear SVMs.
Although linear SVM is efficient for high-dimensional classification tasks, it is not as effective as kernel SVM and cannot guarantee that it explores information sufficiently from well-extracted features. Furthermore, both of linear SVM and kernel SVM fail to realize the property that most fine-grained datasets could be of low rank, which means low-dimensional representations are sufficient for the data sets. That is important for economical storage and image retrieval, where the lower the dimension, the better for searching neighbors.