The present invention, in some embodiments thereof, relates to dynamic maps for exploring and browsing images and, more particularly, but not exclusively, to a way of carrying out local exploring and browsing of a large image data set.
In recent years there has been a gigantic increase in the availability of images, videos, and other kinds of high dimensional data. This trend raises the need for tools to explore such vast datasets in a fast and intuitive way. Recently, image search has received much attention in the scientific community and the high-tech industry. A strong focus has been put on developing relevance feedback techniques, which refine search results using a selection of preferred images. At each relevance feedback step, the user is presented with a new set of images based upon past selections. However, the navigation experience with this approach is not continuous and it requires the user to go over a large collection of images and select the relevant or irrelevant ones at each step. A more intuitive approach is to lay out the images on a manifold and allow users to navigate over it in a continuous manner. However, since the true dimensionality of the image space is high, creating a cohesive manifold that preserves the relations among all images is challenging, if indeed at all possible.
A self organizing map (SOM) is known from T. Kohonen. The self-organizing map, Proceedings of the IEEE, 78(9):1464-1480, 1990, the contents of which are hereby incorporated by reference. The SOM is a popular dimensionality reduction method that produces a dense and intuitive grid-like structure. However, an SOM entails a computationally intensive training process, which is applied globally as a pre-process, making it difficult to use on a very large and dynamic dataset.
Image Browsing.
As large image collections become more and more widespread, it is increasingly important to allow users to easily search and browse these collections. Unlike text documents, the content of an image can be grasped at a glance, and a large number of images can be presented to a user at once. In image search, often the user does not have an exact target in mind (similar to the notion of informational types of tasks in Broder's taxonomy [3]). For example, if the user is looking for a “handshake” image to add to a presentation, the user does not necessarily know which image he or she is looking for. Thus, images presented in the first page of a text-based search result are not necessarily better than those presented in the following pages. Consequently, users have to sequentially scan these results spending considerable effort finding relevant images. Still, most current systems focus on providing text-based image querying rather than navigational support even though studies have shown that image browsing can improve a user's search needs [11].
The most common way to present a set of images is in a two-dimensional grid. In [13] it is shown that automatically arranging a set of thumbnail images in a grid according to their similarity was useful for users in an image browsing task. Similarly, in [11] the idea is to fit a collection of images on a grid view, based on similarity using an MDS-based algorithm. In PhotoMesa [2], images are laid in a large 2D grid. Users can use a zoomable user interface to browse through a large collection of images, panning to browse horizontally or vertically through the images and zooming out to see them semantically grouped into categories. However, the images in PhotoMesa are pre-categorized into directories and ordered according to meta-data (such as file name and date), regardless of visual similarity.
Relevance Feedback.
Many recent search and retrieval systems, including image retrieval, utilize relevance feedback [15], a method to refine search results using selection of preferred elements. [18] presented an image retrieval system that features iterative reference feedback. At each step, the user is presented with a small set of images, and selects a single image that is the closest match to the desired query. Then a new set of images is displayed and the process is repeated. After a small number of iterations, most of the displayed images match the given query. Works such as [1], [4], and [10] employed similar techniques for retrieval of 3D objects.
While this process may be effective at filtering relevant images out of a massive collection, the use of relevance feedback in commercial search interfaces is still relatively rare [16]. One possible explanation is that it requires users to make relevance judgements on each item, which is an effortful user task [16, 6]. Relevance feedback tends to work best when the user selects multiple objects as relevant as well as some objects as irrelevant. However, selecting multiple objects is cumbersome for most users. This is amplified in image search where extractable low-level features (e.g., color, texture, shape) may not necessarily match high-level perception-based human interpretation [21].
Dimensionality Reduction.
Dimensionality reduction is a wide area with applications such as clustering, segmentation, visualization, machine learning and more, and it has been extensively researched over the years. Common dimensionality reduction techniques such as multidimensional scaling (MDS) or locally linear embedding (LLE) [14] create a global manifold that aims to preserve the distances among the high dimensional data points, to the extent possible. Such global solutions are beneficial for applications such as clustering and classification, which rely on the underlying geometry or spread of data. A number of papers regarding mapping of images onto a plane such as [5, 20] follow that trend and focus on global shape, which easily shows relations among different types of images. Often, however, embedding high-dimensional data in a two-dimensional manifold is overly constrained and the embedded data does not reflect the original high-dimensional relations among the data points very well.
When browsing images, there is no need for an accurate representation of the original distances between images. In fact, an even spread of images over the map area can be more beneficial than an accurate representation of the original geometry, especially in cases where the original data includes very distinctive clusters which may appear too far apart for easy navigation. The above mentioned self-organizing map [7] produces a grid which preserves similarity between elements without preserving the distance. Works such as [17] and [9] utilize SOM to visualize a given small set of elements (up to a few hundred samples) in a global cohesive map. Such methods work very well for small sets, however they are too computationally intensive to be effective for massive datasets. In [8], an SOM was used to organize millions of documents. Due to the large volume of the dataset, special tools and methodologies had to be developed in order to allow processing the entire dataset, and several weeks of computation time were required.
Spectral clustering and spectral embedding methods present a different approach, by constructing a nearest neighbors graph and ignoring long distances. The neighbors graph is then embedded using the eigenvectors of the graph's Laplacian, providing a global solution. Using only short distances provides a solution that preserves local distances, but is less constrained globally. The embedding created by spectral clustering for small datasets usually provides a locally continuous solution, in which a pair of data points are near in the embedding only if they are near in the original high-dimensional space, although note that the opposite is not always true. For large datasets, however, the low dimensional space cannot represent the complexity of relations between all samples. As a result, some elements are embedded near each other even though they are not related in the underlying graph. This is illustrated in FIG. 3 which shows typical spectral embedding of different numbers of colors. The image on the left, (a), shows a typical spectral embedding of 80 colors, randomly sampled from three dimensional RGB space. For this relatively low number of samples, the solution is locally continuous. In the right image, (b), 800 random colors were embedded using spectral embedding. The solution is no longer continuous as some colors, such as blue and orange, are far in color space yet embedded next to each other.
In summary, existing methods for image searching do not allow intuitive fluent browsing of the results. Results are ordered arbitrarily or by keyword relevance, with no regard to visual or contextual relations between near images. Relevance feedback methods let the user select relevant images in each step, however the browsing experience is not continuous and new images appear in each iteration.
The following documents are believed to be representative of the art in the field and the contents thereof are hereby incorporated herein by reference:    [1] Ceyhun Burak Akg{umlaut over ( )}ul, B{umlaut over ( )}ulent Sankur, Y{umlaut over ( )}ucel Yemez, and Francis Schmitt. Similarity learning for 3d object retrieval using relevance feedback and risk minimization. Int. J. Comput. Vision, 89:392-407, September 2010.    [2] B. B. Bederson. Photomesa: a zoomable image browser using quantum treemaps and bubblemaps. In Proceedings of the 14th annual ACM symposium on User interface software and technology, pages 71-80. ACM, 2001.    [3] A. Broder. A taxonomy of web search. In ACM Sigir forum, volume 36, pages 3-10. ACM, 2002.    [4] Liangliang Cao, Jianzhuang Liu, and Xiaoou Tang. 3d object retrieval using 2d line drawing and graph based relevance reedback. In Proceedings of the 14th annual ACM international conference on Multimedia, MULTIMEDIA '06, pages 105-108, New York, N.Y., USA, 2006. ACM.    [5] Chaomei Chen, George Gagaudakis, and Paul Rosin. Similarity-based image browsing, 2000.    [6] W. B. Croft, S. Cronen-Townsend, and V. Lavrenko. Relevance feedback and personalization: A language modeling perspective. In DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries, 2001.    [7] T. Kohonen. The self-organizing map. Proceedings of the IEEE, 78(9):1464-1480, 1990.    [8] T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection. Neural Networks, IEEE Transactions on, 11(3):574-585, 2000.    [9] A. Lasram, S. Lefebvre, and C. Damez. Procedural texture preview. In Computer Graphics Forum, volume 31, pages 413-420. Wiley Online Library, 2012.    [10] George Leifman, Ron Meir, and Ayellet Tal. Semantic-oriented 3d shape retrieval using relevance feedback. The Visual Computer, 21(8-10):865-875, 2005.    [11] H. Liu, X. Xie, X. Tang, Z. W. Li, and W. Y. Ma. Effective browsing of web image search results. In Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pages 84-90. ACM, 2004.    [12] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145-175, 2001.    [13] K. Rodden, W. Basalaj, D. Sinclair, and K. Wood. Does organisation by similarity assist image browsing? In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 190-197. ACM, 2001.    [14] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326, 2000.    [15] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra. Relevance feedback: A power tool for interactive content-based image retrieval. Circuits and Systems for Video Technology, IEEE Transactions on, 8(5):644-655, 1998.    [16] I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. The Knowledge Engineering Review, 18(02):95-145, 2003.    [17] Yasuhiko Sakamoto, Shigeru Kuriyama, and Toyohisa Kaneko. Motion map: image-based retrieval and segmentation of motion data. In Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, SCA '04, pages 259-266, Aire-la-Ville, Switzerland, Switzerland, 2004. Eurographics Association.    [18] Nicolae Suditu and Francois Fleuret. Heat: Iterative relevance feedback with one million images. In International Conference on Computer Vision, October 2011.    [19] J. Surowiecki. The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business. Economies, Societies and Nations, 2004.    [20] Kilian Q. Weinberger and Lawrence K. Saul. Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vision, 70:77-90, October 2006.    [21] X. S. Zhou and T. S. Huang. Relevance feedback in image retrieval: A comprehensive review. Multimedia systems, 8(6):536-544, 2003.