The invention generally relates to improvements in methods of automatic video editing, and more specifically to methods used in automatically creating summaries based on webcam video content, as determined by image analysis.
Devices such as video cameras and microphones are often used for monitoring an area or a room. Existing video editing and monitoring systems typically record events when motion is detected, and provide alerts to a user over the Internet. The user can then view just the stored portions of the monitored area when motion was detected. A summary can, for example, provide a series of still images from each video, to give the user a sense of whether the motion is worth viewing. For example, the user can see if a person is in the scene, or if the motion appears to have been a drape moving, a bird, etc.
Magisto Pub. No. 20150015735 describes capturing images, as opposed to editing, based on various factors, and detecting important objects and deciding whether to take a video or snapshot based on importance (e.g., whether someone is smiling). BriefCam has patents that describe detecting an amount of activity, or objects, moving in an image, and overlaying different object movements on the same image, as a mosaic. See, e.g., Pub. 2009-0219300 (refers to different sampling rates on the image acquisition side) and Pub. 2010-0092037 (refers to “adaptive fast-forward”). Pub. No. 20150189402 describes creating a video summary of just detected important events in a video, such as shots in a soccer match. See also Pub. No. 20050160457, which describes detecting baseball hits visually and from excited announcer sound.
Pub. No. 20100315497 is an example of systems capturing the images based on face recognition, with a target face profile. ObjectVideo Pub. No. 20070002141 describes a video-based human verification system that processes video to verify a human presence, a non-human presence, and/or motion. See also Wells Fargo Alarm Services U.S. Pat. No. 6,069,655. Pub. No. 2004-0027242 also describes detecting humans, and other objects. “Examples include vehicles, animals, plant growth (e.g., a system that detects when it is time to trim hedges), falling objects (e.g., a system that detects when a recyclable can is dropped into a garbage chute), and microscopic entities (e.g., a system that detects when a microbe has permeated a cell wall).”
Pub. No. 20120308077 describes determining a location of an image by comparing it to images from tagged locations on a social networking site. Pub. No. 20110285842 describes determining a location for a vehicle navigation system by using landmark recognition, such as a sign, or a bridge, tunnel, tower, pole, building, or other structure
Sony Pub. No. 2008-0018737 describes filtering images based on appearance/disappearance of an object, an object passing a boundary line, a number of objects exceeding a capacity, an object loitering longer than a predetermined time, etc.
ObjectVideo Pub. No. 2008-0100704 describes object recognition for a variety of purposes. It describes detecting certain types of movement (climbing fence, move in wrong direction), monitoring assets (e.g., for removal from a museum, or, for example: detecting if a single person takes a suspiciously large number of a given item in a retail store), detecting if a person slips and falls, detecting if a vehicle parks in a no parking area, etc.
Pub. No. 2005-0168574 describes “passback” [e.g., entering through airport exit] detection. There is automatic learning a normal direction of motion in the video monitored area, which may be learned as a function of time, and be different for different time periods. “The analysis system 3 may then automatically change the passback direction based on the time of day, the day of the week, and/or relative time (e.g., beginning of a sporting event, and ending of sporting event). The learned passback directions and times may be displayed for the user, who may verify and/or modify them.”
Logitech U.S. Pat. No. 6,995,794 describe image processing split between a camera and host (color processing and scaling moved to the host). Intel U.S. Pat. No. 6,803,945 describes motion detection processing in a webcam to upload only interesting “interesting” pictures, in particular a threshold amount of motion (threshold number of pixels changing).
Yahoo! Pub. No. 20140355907 is an example of examining image and video content to identify features to tag for subsequent searching. Examples of objects recognized include facial recognition, facial features (smile, frown, etc.), object recognition (e.g., cars, bicycles, group of individuals), and scene recognition (beach, mountain). See paragraphs 0067-0076. See also Disney Enterprises Pub. No. 20100082585, paragraph 0034.