The invention relates to identifying events in surveillance video, including deposit and removal of objects.
Surveillance video is used in many settings, most notably for security. A typical surveillance video is directed at a location in order to protect the objects in the location from being stolen, to guard against intruders, etc. For example, surveillance videos are found in warehouses to protect a business against a theft of its property, in the parking lots of shopping centers to protect against car theft and robberies, etc.
In some settings, a live person monitors the video camera (or video cameras for different locations) to provide real-time theft or other crime prevention. This type of scenario is typically found in office buildings, parking garages, banks, etc. Such real-time monitoring is often supplemented by recording the images captured by the video camera(s) on video tape or other media. This allows an event to be reviewed in the event that the human monitor fails to see a crime in real-time, or if evidence is needed to investigate or prosecute the crime.
In some settings, real-time monitoring is unnecessary or uneconomical. For example, when guarding against theft of inventory in a warehouse, the cost of real-time monitoring may exceed the average cost of theft. In such cases, the video recording of the scene alone may be used. This provides deterrence against thefts and other crimes (since a would-be perpetrator can typically see the cameras or there are signs warning of them), and also provides a recording of a crime that can be used in an investigation and/or prosecution.
Particular methods and systems have been developed that attempt to automatically identify certain events as they occur in a video image. This may allow the pertinent segments of a video tape recording of the location to be electronically flagged or xe2x80x9cindexedxe2x80x9d with the corresponding event. For example, events that are typically desirable to be identified and indexed include entrance of a person or object into the scene being surveilled, exit of an object or person from the location, and deposit and removal of an object from the location.
Such indexing allows faster review of certain events that occurred over a number of hours of video tape, for example, after a theft or other crime has occurred. If, for example, a theft of a computer has occurred, the index may be used to quickly review all xe2x80x9cremovalxe2x80x9d events identified in the video tape. This may help speed an investigation, for example.
It may be noted that such indexing is helpful even if the video tape for a location has numerous indices of events such as deposits, removals, entrances and exits. For example, in an active warehouse, there may be hundreds of the above events over the course of a number of hours of video surveillance and taping. However, when a crime occurs, it is nonetheless helpful to focus on the particular class of events (such as xe2x80x9cremovalxe2x80x9d of items), rather than attempt to review the entire tape (or other recording media) for the number of hours. Identification of such events may also supplement real-time monitoring of the scene. Particular events (such as xe2x80x9centrancexe2x80x9d of an object or person to the location) may initiate an audio alert to the person monitoring the location.
U.S. Pat. No. 5,969,755 to Courtney, the contents of which are hereby incorporated by reference, describes a particular motion based technique for automatically identifying particular xe2x80x9ccontent basedxe2x80x9d events in video received from a surveillance camera and indexing the video with such events. A video image is divided into segments and video-objects (xe2x80x9cV-objectsxe2x80x9d) are identified in the segments by comparing a reference image with the current image, performing morphological operations and identifying change regions that make up the V-objects. V-objects are tracked between received video frames, thus providing updated position and an estimation of velocity. Using position and estimated velocity, V-objects may also be tracked from one segment of the image to another. Courtney applies certain rules to video segments to identify events.
For example, according to the rules of Courtney, where a tracking sequence between frames of a V-object indicates that it begins (has a track xe2x80x9cheadxe2x80x9d) at a particular frame and remains stationary in subsequent frames, and a track of a moving V-object crosses the track of the stationary object in the frame prior to the head of the stationary track, then the moving V-object is identified as a xe2x80x9cdepositorxe2x80x9d and the head of the stationary track is identified and indexed as a xe2x80x9cdepositxe2x80x9d of an object. Similarly, if a tracking sequence between frames of a V-object indicates that it is stationary and ends (has a track xe2x80x9ctailxe2x80x9d) at a particular frame, and a track of a moving V-object crosses the track of the stationary object in the frame after the tail of the stationary track, then the moving V-object is identified as a xe2x80x9cremoverxe2x80x9d and the tail of the stationary track is identified and indexed as a xe2x80x9cremovalxe2x80x9d of an object.
One problem with the Courtney technique is misidentification of removal of an object that has been in the scene to begin with. The Courtney technique may identify such a removal as a deposit. In effect, the system may detect a stationary change region or stationary xe2x80x9cholexe2x80x9d in the image at the point where the object is removed. The principle cause is that the object is not seen (or recognized) by the system prior to the removal, and thus processes it as part of the reference frame. This stationary change region that arises in the image may thus be classified as a deposit, even though an object has been removed. Courtney itself recognizes this disadvantage at col. 6, lines 47-51.
In addition, the technique of Courtney relies on identifying, estimating and tracking motion and velocity of multiple objects, and its rules apply to the interaction of the tracking of one object in relation to another. Such estimating and tracking of multiple objects with respect to each other in an image is relatively complex and may give rise to a relatively high rate of incorrect detection and/or identification of events.
Another technique applied to determine a change in a scene is described in U.S. Pat. No. 6,049,363 to Courtney et al. (xe2x80x9cCourtney IIxe2x80x9d), the contents of which are hereby incorporated by reference. Courtney II focuses on determining the presence of an object in one image and the absence of the object in another image. For images comprised of pixels in the case of xe2x80x9cTV dataxe2x80x9d, corresponding pixels for two separate images are subtracted to identify a xe2x80x9cchange regionxe2x80x9d corresponding to an object in one of the two images. Pixels identified as edges in the two images are then each compared with corresponding border pixels in the xe2x80x9cchange regionxe2x80x9d in the image. Where the compared pixels between an image and the change region have a high incidence of correspondence, the object is identified as being present in that image.
Courtney II concedes that image edges are not easily detectable in infrared images. Thus, for infrared images, Courtney II identifies a change region using the two separate images and then determines the variance of pixel intensities within the change region of the two separate images. Based on a xe2x80x9ccontrasting haloxe2x80x9d found in the images of objects in IR cameras, the object is determined to be in the image having the greatest intensity variance.
Many difficulties arise from the Courtney II technique of comparing two images directly to generate such a change region, and then further comparing the two images with the change region itself. For example, slight movement of the video camera, or diffuse edges in the images, can result in a change region having a border that does not correspond to the edges in either of the two images. In addition, lighting differences between the two separate images may give rise to a myriad of false xe2x80x9cchange regionsxe2x80x9d and then an equally false xe2x80x9cmatchxe2x80x9d between the border of such a false change region and one of the images. The infrared camera technique relies on an aberration in the optics of an IR camera and which may not actually arise in better quality IR cameras.
Among other things, it is an object of the invention to provide a highly accurate system and method for detecting events in surveillance video and images. It is an addition objective that the system and technique provides for reliable detection of important events, such as removal and deposit, without a high incidence of incorrect identifications of such events. It is also an objective that the system and method are implemented in a relatively simple and straightforward manner.
The system and method may itself provide detection of events or changes to the location, as well as accurate identification of whether an object has been deposited or removed in a changed location. Alternatively, the system and method may be used to provide supplemental analysis to existing techniques and methods that detect events and attempt to identify the type of event. When used as a supplemental analysis, an existing technique (such as in the Courtney patent) may be used to detect an event in the image and even an identification of the type of event, such as removal or deposit. The technique of the invention may then be used to analyze the event further, to provide a more reliable indication of event detection and/or identification of the type of event.
In accordance with these objectives, the invention provides a method and system of reliably identifying events in surveillance video and images by creating an outline of objects in the location from at least one received image. The edges detected in a subsequent image are compared with the outline. When an object is deposited in the location, there will be more edges in the subsequent image than in the outline. When an object is removed from the location, there will be less edges in the subsequent image than in the outline. Thus, deposits and removal of objects from the location may be accurately identified and indexed. In addition, the outline may be updated after each detected deposit and removal in order to accurately reflect the new background scene. Additionally, the outline and the subsequent images may be divided into segments and separately compared.
The edges detected in a subsequent image may be compared with the outline by determining whether edge segments are present or absent in the corresponding portion of the outline. Alternatively, it may be determined whether edge segments corresponding to an object (comprising a closed loop of edges) are present or absent in the corresponding portion of the outline. (Templates of objects as determined by the edge segments may be compared with the outline, or vice versa.) If there are more edge segments in the subsequent image than in the outline, then there is a deposit. If there are less, then there is a removal. Templates of objects as determined by the edge segments may be compared with the outline. Where a template does not match the outline, there is a deposit. Where there is no template to match an object in the outline, a removal has occurred. The templates may alternatively be determined from the outline and compared with the edge segments.
In addition, the edges detected in a subsequent image may be compared with the outline by determining the length of the edges (through integration or other image processing) in the subsequent image and comparing it with the length of the curves in the outline. If the length of the edges in the subsequent image is greater than in the outline, there is a deposit event. If less, then there is a removal event.
The above described technique may be used to detect a location changing event itself, as well as identify the event as a removal or deposit. It may alternatively be used as a supplemental analysis that provides verification of the detection of an event and identification of the type of event for an existing technique that detects a location changing event.
Another embodiment of the invention provides a technique for supplemental analysis that provides identification of the type of event after a deposit or removal event has been detected. Existing techniques (such as described in the Courtney patent) or techniques of the present invention that detect the event will also provide the boundary location of the deposited or removed object in the image after the event has occurred. To identify a deposit or removal event, the image gradient of the post-event image is integrated along the boundary location of the object. If the image gradient integral is below a threshold, indicating continuity along the boundary of the location of the object in the post-event image, then a removal is indicated. If the image gradient is above a threshold, indicating an edge along the boundary of the location of the object in the post-event image, a deposit is indicated.
Instead of using a threshold, the image gradient of a pre-event image is integrated along the boundary of the location of the object and the pre-event image gradient integral is compared with the post-event image gradient integral. If the pre-event image gradient is larger, indicating an edge along the boundary of the location of the object in the pre-event image, a removal is indicated. If the post-event image gradient integral is larger, indicating an edge along the boundary of the location of the object in the pre-event image, a deposit is indicated.
If the pre- and post-event image gradient integrals are substantially equal, then the result may be inconclusive. For example, equal image gradient integrals could arise if there is no deposit or removal of an object. Equal image gradient integrals could alternatively arise if a new object having the same dimensions is deposited between the camera and the object in the pre-event image. Equal image gradient integrals could also arise if an object is removed but its removal uncovers (to the camera) a previously obscured object that has the same dimensions. Thus, if the pre- and post-event image gradient integrals are substantially equal, the event may be indexed generically as an xe2x80x9cobject-movedxe2x80x9d event.
Still another embodiment of the invention provides a technique for supplemental analysis that provides identification of the type of event after a deposit or removal event and its boundary location in the image is detected, For either or both of the pre-event and post-event images, the color and texture inside the boundary location in the image is compared with the color and texture outside the boundary location. If the color and texture inside and outside the boundary are substantially similar in the post event image (indicating a continuous image), for example, then the event is a removal event. If the color and texture inside and outside the boundary location are substantially different in the post event image, then a deposit event is recorded. Analogous comparisons and identification of the event may be made in the pre-event image. Both pre- and post-event images may be used together in the determination. Color histograms of the event location and the surroundings may be used to determine the degree of similarity and dissimilarity between the regions for the pre- and post-event images.