1. Technical Field
The invention is related to searching, browsing and retrieval of images or image sequences from a video clip or other image sequence, and in particular, to a system and method for using a computationally efficient scene generative model in an automatic fully adaptive content-based analysis of the image sequence.
2. Related Art
Conventional schemes for searching through image sequences include content-based search engines that use various types of aggregate statistics over visual features, such as color or texture elements of frames in the image sequence. However, these schemes tend to be sensitive to the quality of the data. While professionally captured or rendered image sequences tend to be of high quality, often, a home video or the like is of relatively poor quality unsuited for use with such conventional schemes. For example, a typical home video or image sequence having bad or degraded color characteristics, or blurry or out of focus portions of scenes within the image sequence makes it difficult to recognize textures within that image sequence. As a result, these conventional statistics-based search engines perform poorly in such an environment.
However, a more serious limitation of existing schemes is that the spatial configuration of any particular scene is typically not encoded in the scene description, thereby making analysis of the image sequence more difficult. In order to address this concern, one conventional scheme attempts to preserve some of the spatial information using multiresolution color histograms. Other approaches attempt to circumvent the lack of global spatial information in representations based on local features by working with a large number of features and automatically selecting the most discriminative ones.
In either case, the conventional approaches that attempt to model the spatial layout of particular regions within an image sequence are subject to several limitations. In particular, the limitations of conventional spatial-layout based schemes include the amount of user interaction required for specifying positive and negative examples, the small size of foreground objects that can be modeled, thereby limiting the application domain, and the necessity of handcrafting cost functions that need to be manually weighted.
Another conventional scheme has attempted to jointly model motion and appearance by using derivatives in the space-time volume for searching through image sequences. However, this scheme is both complicated and computationally inefficient.
Yet another conventional scheme provides a comprehensive search engine that allows for a motion-based search based on a query consisting of region appearances and sketched motion patterns. This search engine is typically used by professional users searching for particular actions or activities in professional sporting events such as soccer. However, this scheme requires a significant amount of user input in order to identify scenes or image sequences of interest, and is not ideally suited for home use.
Therefore, what is needed is a computationally efficient system and method for automatically searching or browsing through videos or other image sequences to identify scenes or image sequences of interest. Further, such a system and method should be adapted to work well with either high quality image data, such as a typical television type broadcast, or with relatively poor quality image data, such as, for example, a typical home video or image sequence. Finally, such a system and method should require minimal user input to rapidly and automatically identify image scenes or sequences of interest to the user.