1. Field of Invention
The present invention relates to an identification apparatus in a video surveillance system for identifying properties of an object detected in a video sequence captured by a video surveillance camera and to a method for identifying properties of an object detected by a video surveillance camera. The invention also relates to a video surveillance system for tracking an object in a video surveillance system and to a method for tracking an object in a video surveillance system.
2. Description of the Related Art
Closed Circuit TeleVision (CCTV) systems are used mainly for surveillance purposes. Recent years surveillance systems have increased at airports, public areas, schools, highways and many other places. The CCTV market consists of two segments, analogue systems and digital network video systems. Network video systems has several advantages when compared with the analogue systems. These are the most important reasons why the network video market share is growing: Remote accessibility; Easy, future proof integration; Scalability and flexibility.
One characteristic that differentiates digital network video systems from analogue systems is the former systems' suitability of image processing in real time. This is possible when integrating some sort of digital signal processor with the camera and implementing algorithms on it.
Real time surveillance is today very labour intensive, which leads to high costs. The level of human's attention is also rapidly degrading over time. It would therefore be desirable to use intelligent video functions for processing images as an assisting tool in these types of jobs. That would both reduce the labour costs and improve the performance. Useful intelligent video functions that would facilitate the surveillance in real time are: Motion detection, e.g. detect a trespassing person in an empty facility; Detect specific events, e.g. detect a car crash; Recognition, e.g. follow the path of a suspicious person in a big ambiguous camera system.
If network video cameras could perform these functions in a satisfying way, they would have a unique advantage over their analogue counterparts.
For being able to detect and track non rigid bodies, such as humans, in a video surveillance system comprising a plurality of cameras, i.e. between different video sequences or scenes captured by different cameras, following factors has to be taken into consideration: Humans are not rigid and therefore their shapes may change; Different viewpoints in the sequences; Different illumination level between scenes and within a scene; Different illumination color between scenes and within a scene; Different distance to the camera in the sequences.
Due to these circumstances, several methods for detecting and tracking non-rigid objects are not applicable. Low resolution and the distance between the camera and the object make all methods dealing with details useless. Texture in peoples clothing tend to be very fine, and therefore texture based methods also falls.
It is an object of the invention to be able to recognize non-rigid objects, such as humans. The method can therefore not be sensitive for changes in the shape of the object.
Since the customer of surveillance cameras does not want to calibrate their cameras, the method cannot not depend on that the position of the camera is known. Because of that the angle from which the object is viewed can not be taken into account. Methods based on relative sizes of different parts of the object are therefore useless, e.g. relative lengths of arms and legs.
The tracking problem has been widely explored, but known methods for tracking people are not applicable here due to some important differences between tracking a person within a scene and recognizing a person in different scenes, where different scenes may originate from different cameras. When tracking a person within a scene, the problem is to find the person in every new frame. The information from the previous frame is then very useful. The illumination, angle of view and position of the person are all likely to be the same or change only a little between frames. When changing scene all this information might be lost. Therefore the methods for recognizing humans between different scenes have to use other features than regular tracking methods.
A method for identifying and tracking objects between different scenes should be used as an assisting tool in surveillance systems with a large amount of cameras. Therefore it is acceptable that the method does some wrong matches rather than misses some right matches. The surveillance personal that are watching the monitors can in the former case easily manually recognize the right person. In the latter case the method would not be to any help for the personal.
Consequently, there is a need for a method and a system for identifying properties of an object, which properties can be used for tracking objects between different video sequences, which method would be reliable in such a way that it does not miss any object occurring in two different video sequences.