There are known surveillance camera systems for monitoring a predetermined area with an imaging device, such as a camera. For example, PTL 1 discloses a technique for capturing an image of an object using multiple cameras with overlapping fields of view to track the object across multiple camera views.
Specifically, the object is detected and tracked in each of video images captured by the cameras, thereby determining a movement path of two-dimensional coordinates of the object across image frames of the video images captured by the cameras. Three-dimensional (3D) coordinates of the object are estimated based on the movement path, thereby tracking the object across the multiple camera views.
PTL 2 discloses a technique for automatically detecting an abnormal event in a video image captured by a camera in a surveillance camera system. Specifically, whether a person in a specific place is a suspicious person is determined based on the staying time of the person in the place. If the person is a suspicious person, an alarm is generated.
For image processing associated with multiple cameras, NPL 1 describes a technique for camera calibration by allowing a camera to capture an image of a predetermined planar pattern.
NPL 2 discloses a method for obtaining a 3D structure adaptive to motion using a predetermined model estimation technique, or algorithms for estimating a 3D position with multiple cameras.
NPL 3 discloses an algorithm for multi-person tracking-by-detection. In particular, the technique (algorithm) described in this literature uses the continuous confidence of pedestrian detectors to detect and track a plurality of pedestrians.
It is assumed that multiple objects are included in a video image captured by a camera and the objects are individually tracked. To determine whether objects that appear in image frames, corresponding to different times, of the video image are the same object, each of the objects is assigned a unique code that identifies the object.
This code is called a “tracking label”.
For example, according to PTL 1, individual objects included in image frames of a video image captured by each of the cameras are tracked, thus obtaining a tracking result. Tracking results for the cameras are obtained and then combined. In the related art, however, the same object may be assigned different tracking labels at different times.
FIG. 17A illustrates image capture ranges of three cameras and the motion of a person moving in the ranges. In FIG. 17A, the positions of the person at different times are indicated by T1 to T9. FIG. 17B is a timing diagram illustrating times T1 to T9 in FIG. 17A, FIG. 17B illustrates whether the person is located in the fields or field of view of a first camera, a second camera, and/or a third camera and is image-captured by the first, second, and/or third cameras and is assigned tracking labels, or whether the person is located out of the fields or field of view of the first, second, and/or third cameras. For example, it is assumed that a target object (person in FIG. 17A), serving as a moving object, takes a route in the fields of view of the first to third cameras as illustrated in FIG. 17A. For the first camera, as illustrated in the timing diagram of FIG. 17B, the target object first enters the field of view of the first camera at time T1 and is assigned a tracking label 1-1. The target object then leaves the field of view of the first camera at time T4. Consequently, such a tracking process is temporarily interrupted. After that, at time T6, the target object again enters the field of view of the first camera, and the tracking process is again started. Disadvantageously, a new tracking label 1-2 is assigned to the same target object.
In tracking a target object, serving as a moving object, in particular in a wide surveillance area, the accuracy of the tracking process is lowered because the target object often enters and leaves the fields of view of the respective cameras.
For example, if the tracking results in the above-described related art are used for automatic detection of an abnormal event as described in PTL 2, the same person may be recognized as another person. Specifically, it is assumed that whether the person is a suspicious person is determined based on the staying time as described in PTL 2. As illustrated in FIG. 17A, the person cannot be tracked at time T4 because the person temporarily leaves the field of view of the first camera (because of occlusion of the field of view). When the person again enters the field of view of the first camera at time T6, the person is recognized as another person because another tracking label is assigned to the same person in the video image captured by the same camera, as described above. It is therefore difficult to correctly measure the staying time of the person.
Since tracking labels are assigned in video images captured by respective cameras independently of one another in the related art, a single object is assigned different tracking labels in different video images. Disadvantageously, the same object is recognized as different objects. For example, in FIG. 17B, the single person is assigned the label 1-1 (at time T1) by the tracking process associated with the video image captured by the first camera. The person is assigned a label 24 (at time T2) by the tracking process associated with the video image captured by the second camera, and is further assigned a label 3-1 (at time T3) by the tracking process associated with the video image captured by the third camera. Consequently, the same object is recognized as different objects, leading to lower accuracy of the tracking process using the multiple cameras in combination.