Video monitoring devices receive streams of video data, or “feeds,” from video cameras to monitor premises for various purposes, including, e.g., security monitoring, infant or elderly monitoring, videoconferencing, etc. A video feed may be constantly or periodically monitored by security personnel, and/or recorded for later review. However, content of the video feed may be redundant, especially in security monitoring or video surveillance, where the background remains unchanged. In addition, motion or an activity of interest may occur infrequently or at random and unpredictable times throughout the video. Such characteristics of video feeds may result in undesirably high cost and difficulty of monitoring and surveillance.
Automatic identification of frames of interest in video feed, such as those depicting motion or changes in video background, may decrease monitoring costs and improve the efficiency and sensitivity of monitoring by alerting or otherwise providing notice of significant events. For example, a motion event in the field of a video camera, such as a person entering a secure area, can trigger the monitoring system to begin transmission of a video feed from the camera, to begin recording the video feed, and/or otherwise alert security personnel of the potential intrusion. In another application, movement of a baby being monitored, or entry of a person into the baby's room can trigger the monitoring system to begin transmission of a video feed, begin recording the video feed, and/or alert the parents. The identification of motion or a change in the field of the video camera can also trigger a monitoring system, that is otherwise idle, to transition to an activated status so as to provide an alert or to begin recording.
In another application, automatic identification of frames of interest in recorded video data and generation of a summary representation may allow for improved analysis or editing of the recording. Rather than attempting to manually view the entire recording from start to finish, a summary representation of the recording may permit automatic identification of frames of interest in the recording, and allow a user to directly extract those frames, without the need to view the entire recording. Furthermore, analysis of the summary representation may permit a user to quickly jump to key segments of the recording. Automatic identification of frames of interest in video data may also be applied to virtual reality productions to create better videos, or permit interactive video editing.
Recently, small-sized, lightweight, portable smart video recorders have been developed to incorporate video summary and key frame extraction methods in various types of video analysis. These methods, however, require high computation capacity and complex algorithms. There remains a need for methods and apparatus for summarizing video data and identifying key frames in the data that can be readily implemented in portable video equipment to address the challenges discussed above.