Video cameras, such as Pan-Tilt-Zoom (PTZ) cameras, are omnipresent nowadays and commonly used for surveillance purposes. Such cameras capture more data (video content) than human viewers can process. Hence, a need exists for automatic analysis of video content. The field of video analytics addresses this need for automatic analysis of video content. Video analytics is typically implemented in hardware or software. The functional component may be located, on the camera itself, a computer, or a video recording unit connected to the camera. When multiple cameras are used to monitor a large site, a desirable technique in video analytics is to estimate the three-dimensional (3D) trajectories of moving objects in the scene from the video captured by the video cameras and model the activities of the moving objects in the scene.
The term 3D trajectory reconstruction refers to the process of reconstructing the 3D trajectory of an object from a video that comprises two-dimensional (2D) images. Hereinafter, the terms ‘frame’ and ‘image’ are used interchangeably to describe a single image taken at a specific time step in an image sequence. An image is made up of visual elements, for example pixels, or 8×8 DCT (Discrete Cosine Transform) blocks as used in PEG images. Three-dimensional 3D trajectory reconstruction is an important step in a multi-camera object tracking system, enabling high-level interpretation of the object behaviours and events in the scene.
One approach to 3D trajectory reconstruction requires overlapping views across cameras. That is, the cameras must have fields of view that overlap in the system. The 3D positions of the object at each time step in the overlapping, coverage are estimated by triangulation. The term ‘triangulation’ refers to the process of determining a point in 3D space given the point's projections onto two or more images. When the object is outside the overlapping coverage zone but remains within one of the fields of view, the object tracking system continues to track the object based on the last known position and velocity in the overlapping coverage zone. Disadvantageously, this triangulation technique depends on epipolar constraints for overlapping fields of view and hence cannot be applied to large scale surveillance systems, where cameras are usually installed in a sparse network with non-overlapping fields of view. That is, the fields of views do not overlap in such large scale surveillance systems.
Another approach to reconstructing the 3D trajectory of a moving object is to place constraints on the shape of the trajectory of the moving object. In one example, the object is assumed to move along a line or a conic section. A monocular camera moves to capture the moving object, and the motion of the camera is generally known. However, the majority of moving objects, such as walking persons, in practical applications frequently violate the assumption of known trajectory shape.
In another approach for constructing a trajectory from overlapping images, the 3D trajectory of a moving object can be represented as a compact linear combination of trajectory bases. That is, each trajectory in a 3D Euclidean space can be mapped to a point in a trajectory space spanned by the trajectory bases. The stability of the reconstruction depends on the motion of the camera. A good reconstruction is achieved when the camera motion is fast and random, as well as having overlapping fields of view. A poor reconstruction is obtained when the camera moves slowly and smoothly. Disadvantageously, this method is difficult to apply in real-world surveillance systems, because cameras are usually mounted on the wall or on a pole, without any motion.
In yet another approach, a smoothness constraint can be imposed requiring the error between two successive velocities should be generally close to zero. This method can recover the camera centres and the 3D trajectories of the objects. However, the reconstruction error of the points is orders of magnitude larger than the camera localization error, so that the assumption of motion smoothness is too weak for an accurate trajectory reconstruction.
Thus, a need exists for an improved method for 3D trajectory reconstruction in video surveillance system.