1. Field of Art
The present invention generally relates to the field of digital video, and more specifically, to methods of identifying real-world objects present within a video.
2. Background of the Invention
Currently, automated recognition within a digital video of images of real-world objects of interest to a user, such as people, animals, automobiles, consumer products, buildings, and the like, is a difficult problem. Conventional systems, to the extent that they allow for such recognition at all, typically use supervised learning which requires training sets of images that have been manually labeled as representing particular objects. Thus, such conventional systems rely on direct human input to provide object exemplars explicitly labeled as representing the object, such as a set of images known to include, for example, dogs, based on prior human examination. However, such human input is expensive, time-consuming, and cannot scale up to handle very large data sets comprising hundreds of thousands of objects and millions of images. This is particularly a problem in the context of video hosting systems, such as Google Video or YouTube, in which users submit millions of videos, each containing numerous distinct visual objects over the length of the video. The use of unsupervised learning techniques, in which the explicit input of human operators is not required to learn to recognize objects, has not yet been achieved for large-scale image recognition systems.