The exemplary embodiment relates to the retrieval arts. It finds particular application in connection with image retrieval using an image as the query and where original representations of both the query image and the target images are embedded in a subspace which is particularly suited to retrieving images in the same category as the query image.
Retrieval systems enable selective retrieval of images from a database (for example, a dedicated database, or the Internet, or some other collection of documents). One use of such systems is in query-by-example instance-level image retrieval: given a query image depicting an object/scene/landmark/document, the aim is to retrieve other images of the same object/scene/landmark/document, within a potentially large database.
Typically, the query images and database images are described with fixed-length vectors which aggregate local image statistics (original image representations or “signatures”). As examples, the bag-of-visual-words or the Fisher vector may be used to generate a multi-dimensional vector. See, for example, G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual categorization with bags of keypoints,” ECCV SLCV, 2004; J. Sivic and A. Zisserman. “Video Google: A text retrieval approach to object matching in videos,” ICCV, 2003; and F. Perronnin, J. Sánchez and T. Mensink, “Improving the fisher kernel for large-scale image classification,” ECCV 2010. For ease of computation, some form of dimensionality compression is performed. The compression step typically involves an unsupervised dimensionality reduction step, such as Principal Component Analysis (PCA). See, for example, Y. Weiss, A. Torralba and R. Fergus, “Spectral hashing,” NIPS, 2008; H. Jégou, M. Douze, C. Schmid and P. Pérez, “Aggregating local descriptors into a compact image representation,” CVPR, 2010; and A. Gordo and F. Perronnin, “Asymmetric distances for binary embeddings,” CVPR, 2011.
The purpose for such query-by-example systems may be, for example, for duplicate removal, copy detection or image annotation. For such applications, it is desirable to have good retrieval performance, both in terms of precision and recall. It is often the case that the database images are not labeled, or are not labeled with accurate or useful labels. Thus, in many instances, it may not be possible to improve precision and recall measures by creating a hybrid query, which relies on keyword searching as well as an image signature. There remains a need for a system and method which provide improvements in query-by-example retrieval.