Image recognition or classification is the task of assigning a predefined category label to an input image, which is a fundamental building block for intelligent image content analysis. For instance, an image of a bird may be labeled and assigned to one or more of the following categories: ornithology, birds, and blue heron. Even though it has been studied for many years, image classification remains to be a major challenge. Perhaps one of the most significant developments in the last decade in image recognition is the application of local image features, including the introduction of the Bag-of-Visual-Words (BOV) model and its extensions. In general, the BOV model treats an image as a collection of unordered local descriptors extracted from small patches of a given image. These local descriptors, sometimes referred to as local image descriptors or local visual descriptors, are vectors that mathematically represent one or more features depicted in the corresponding image patch (e.g., beak or head plumage of a bird). In any case, the BOV model quantizes the extracted local descriptors into discrete “visual words” and then computes a compact histogram. The histogram is a vector containing the (weighted) count of each visual word in the given image, which can be used as a feature vector in the image classification task. However, the BOV model discards the spatial order of local descriptors, which limits its descriptive power. To overcome this problem, one particularly popular extension of the BOV model uses spatial pyramid matching for recognizing natural scene categories and to take into account the global image structure. Other vector representations of local image descriptors, such as aggregation of local image descriptors, super-vector coding of local image descriptors, and Fisher-vector coding of local image descriptors, extend the BOV model to provide richer and more discriminative image representations for image classification and retrieval tasks. Even though such variants and extensions of the BOV methodology perform well on general object categorization tasks, they tend to be suboptimal in distinguishing finer details.