The following description of related art is intended to provide background information pertaining to the field of the invention. This section may include certain aspects of the art that may be related to various aspects of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art.
Image processing is widely gaining popularity with the increasing usage of digital images for various purposes. Particularly, one of the most important areas in image processing relates to scene classification that deals with the problem of understanding the context of what is captured in an image. Understanding a holistic view of an image is a relatively difficult task due to the lack of text labels that represent the content present in them. Existing systems and methods for scene recognition have a number of drawbacks and limitations. Existing solutions treat indoor and outdoor scene recognition as two different problems due to the significant variation in appearance characteristics between indoor and outdoor images. It has largely been perceived that different kinds of features would be required to discriminate between indoor scenes and outdoor scenes. This is highly inefficient since different systems and methods are required to be deployed for scene recognition of indoor and outdoor scenes. Current indoor scene recognition systems and methods use part-based models that look for characteristic objects in an image to determine the scene which results in the inaccurate assumption that similar-looking objects distributed spatially in a similar manner, constitute the same scene.
Further, the current solutions are unable to effectively address the problem of overfitting caused by the use of real world image datasets (as input to these systems) that capture a lot of intra-class variation, i.e. the significant variation in appearance characteristics of images within each class. Furthermore, existing solutions use hand crafted features to discriminate between images/scenes. However, features that are good for discriminating between some classes may not be good for other classes. Existing approaches to image/scene recognition are incapable of continuously learning from or adapting to the increasing number of images uploaded to the Internet every day.
Another important aspect of image processing relates to similarity matching of images. With the growing requirement for image recognition technologies, the need for scalable recognition techniques that can handle a large number of classes and continuously learn from Internet scale images has become very evident. Unlike searching for textual data, searching for images similar to a particular image is a challenging task. The number of images uploaded on the World Wide Web is increasing every day and it has become extremely difficult to incorporate these newly added images into the search database of existing similarity matching techniques. As discussed above, existing image recognition solutions use hand crafted features to discriminate between images. A major disadvantage of such systems is that it results in a large reconstruction error, i.e. reconstruction of images using these hand crafted features is likely to produce an image very different from the original image.
Thus, there is a need for building improved and scalable image processing systems for scene classification and similarity matching that are capable of handling a large number of images/classes.