Field of the Invention
Embodiments of the invention provide techniques for computationally analyzing a sequence of video frames. More specifically, embodiments of the invention relate to techniques for learning behaviors represented in a scene depicted in the sequence of video frames.
Description of the Related Art
Some currently available video surveillance systems provide simple object recognition capabilities. For example, a video surveillance system may be configured to classify a group of pixels (referred to as a “blob”) in a given frame as being a particular object (e.g., a person or vehicle). Once identified, a “blob” may be tracked from frame-to-frame in order to follow the “blob” moving through the scene over time, e.g., a person walking across the field of vision of a video surveillance camera.
Prior to analyzing scene foreground, a background model (or image) of the scene may need to be identified. The background model generally represents the static elements of a scene captured by a video camera. For example, consider a video camera trained on a stretch of highway. In such a case, the background would include the roadway surface, the medians, any guard rails or other safety devices, and traffic control devices, etc., visible to the camera. The background model may include an expected pixel color value for each pixel of the scene when the background is visible to the camera. Thus, the background model provides an image of the scene in which no activity is occurring (e.g., an empty roadway). Conversely, vehicles traveling on the roadway (and any other person or thing engaging in some activity) occlude the background when visible to the camera and represent scene foreground objects.
However, some scenes present dynamic or otherwise complex backgrounds making it difficult to distinguish between scene background and foreground. Examples of complex backgrounds include ones where the video is noisy, the video contains compression artifacts, or the video is captured during periods of low or high illumination. In such cases, it becomes difficult to classify any given pixel from frame-to-frame as depicting background or foreground, (e.g., due to pixel color fluctuations that occur due to camera noise). A scene background is dynamic when certain elements of the background are not stationary or have multiple, visually distinguishable, states. Consider a scene with a camera trained on a bank of elevators. In such a case, the pixels depicting a closed elevator door would represent one background state, while a back wall of an elevator carriage visible when the elevator doors were open would be another state. Another example includes a traffic light changing from green to yellow to red. The changes in state can result in portions of the traffic light being incorrectly classified as depicting a foreground object. Other examples of a dynamic background include periodic motion such as a scene trained on a waterfall or ocean waves. While these changes in the scene are visually apparent as changes in pixel color from frame-to-frame, they should not result in elements of the background such as pixels depicting an elevator carriage or the pixels depicting light bulbs within a traffic light being classified as foreground.