Many retail stores use video pan, tilt, and zoom, cameras (PTZ) for surveillance of customers to prevent shoplifting. If the camera is pointed and zoomed appropriately a person can be observed in detail and innocent actions distinguished from theft. It is common practice for security personnel to monitor customers with a PTZ camera as they move around a store, but manual tracking is limited by attention span and the number of targets that can be tracked. Similar manual observation is unsuitable for other security tasks such as smoke detection, left package identification, and face recognition. There is a need for reliable automatic processing of images captured by a video camera that is addressed by this disclosure.
Automatic motion detection and object tracking has been developed to facilitate security monitoring by closed circuit television. In video motion detection, comparison is made between a live video image and a stored representation of one or more earlier video images in order to detect relative motion on the basis of a difference between the live and stored images. Most commercial surveillance cameras incorporate motion detection that may be used to trigger an alarm. U.S. Pat. No. 5,396,284 discloses a multiple camera system, which incorporates motion detection performed by a computer connected to the cameras.
Object tracking is a more sophisticated technique than motion detection as it identifies an object and follows its movement. U.S. Pat. No. 5,434,617 discloses means for automatically controlling camera movement to track and display the location of a moving object. A fixed spotting camera is used to capture a field of view, and a moving PTZ camera is directed to the target's location. Information for driving the PTZ camera is obtained with reference to the difference between a current image and a previous image within the spotting camera's field of view. This system requires that a fixed camera always be used in conjunction with the PTZ camera. The reference discusses a specific approach to object tracking but also indicates that vision-based surveillance systems collect far too much video to analyze off-line at some later time; the solution proposed in this paper still relies on central processing and does not propose a distributed solution in which processing is carried out at the source of the video data. In failing to recognize this the prior art misses an important benefit of in-camera processing which is that the total system processing power rises in proportion to the amount of video data captured; for every new camera added more compute is also added. This inherent scaling reduces the cost of deployment because computing power does not need to be over-specified at first installation in anticipation of future expansion.
The use of behaviour analysis software in conjunction with object tracking and motion detection can be used to filter abnormal from normal behaviour. U.S. Pat. No. 5,666,157 describes a device in which video signals of sampled movements of an individual are compared with movements indicative of criminal intent. The level of criminal intent of the individual or individuals is determined and an appropriate alarm signalled. Further analysis software may be used to identify objects left unattended, or other hazards such as smoke or fire. For example, automatic detection of smoke through video cameras is discussed in Aube, 12th International Conference on Automatic Fire Detection, 2001, which identifies three distinct processing techniques; in this reference all processing is performed at the base station but the algorithms discussed are applicable to in-camera processing. Many different algorithms may be usefully executed on captured video data and those algorithms will vary with different environments. For example, the algorithms deployed to monitor the entrance to a bank might be different to those used to monitor hazardous material.
All the algorithms discussed produce more accurate results if they are applied to high-resolution image data. If the necessary image processing is applied away from the camera the quantity of data transmitted is very large. For this reason the prior art is unable to cope with the heavy demands of a large security installation such as might be found in an airport. If hundreds of video cameras are used to monitor passenger behaviour then the data flow through the system might be in the region of 1 Terabit (one million million bits) per second. Such a system would be prohibitively expensive to deploy. Further, if each camera is used to track a single object the number of cameras must be very large or many candidates for tracking will be ignored. The ability to track more than one object with a single camera greatly improves the quality of surveillance provided by the system.
The present invention addresses such a need.