1. Field of the Disclosure
The present disclosure generally relates to a video search and retrieval system, and, more particularly, to an apparatus and method that uses faces as a primary index or cueing mechanism into video data, where faces are located, extracted, and matched automatically.
2. Brief Description of Related Art
Vast amounts of video data exist. These include those produced for commercial consumption, those produced for personal and home uses, and those recorded for the purposes of security and monitoring.
The needs to monitor live video and to search recorded video are pressing ones. Most home video and security video are labeled with a minimal amount of descriptive information, often only a title and date. The lack of descriptive information makes searching a video archive for a particular event or person a burdensome undertaking. For example, if a theft occurs during the night in a large office complex, the process of finding suspects by reviewing the overnight-recorded video from security cameras will be very time consuming due to the number of cameras that may have recorded the suspect, the large time period (e.g., 8-10 hours of the night) during which the theft may have occurred, and the essentially sequential review of the contents of individual video tapes or discs. A similar need to search video arises in many other circumstances including, for example, live monitoring of security cameras, monitoring employee behavior in cases of suspected employee theft, reviewing actions of people in a secure facility such as a military base, monitoring of company headquarters, or school, reviewing behavior of patients and staff in a medical care facility, searching for family and friends in a home video, searching video archives on the Internet, and searching consumer or sports video archives such as those of broadcast or cable video.
There are few automatic tools that can help users automatically identify events of interests in live and recorded video. Existing methods for searching, navigating, and retrieving video have focused on broadcast video produced for mass consumer consumption. See, for example, the discussion in “Intelligent Access to Digital Video: The Informedia Project” by Wactlar, H., Stevens, S., Smith, M., Kanade, T., IEEE Computer, 29(5), Digital Library Initiative Special Issue., May, 1996; and in “Interactive Content-based Retrieval of Video,” Smith, J. R., Basu, S., Lin, C.-Y., Naphade. M., Tseng. B., IEEE International Conference on Image Processing (ICIP-2002), September, 2002. The methods disclosed in these publications are designed for high quality broadcast video where the video content consists of heterogeneous video segments from many sources spliced together in small topical segments and video indexing relies on transcripts obtained from close-captioning and/or speech recognition. Furthermore, some existing methods of analysis of the visual component of video are limited to detecting video shot boundaries and face detection is carried out for identifying key frames, where a keyframe is a single frame representative of a shot. Such methods are discussed, for example, in U.S. Pat. No. 6,711,587 to Dufaux, F., titled “Keyframe Selection to Represent a Video,” and U.S. Patent Application Publication No. US2006/0110128 to Dunton et al., titled “Image-key index for Video Program Stored in Personal Video Recorder.”
Hence, it is desirable to devise a video search, retrieval and cueing methodology that uses face or object detection techniques to automatically create an index of human faces or objects-of-interest that appear in the video. It is also desirable for the video search methodology to allow a user to selectively view video segments associated with a specific human face or object without performing a time-consuming search of the entire video for those video segments.