The present invention relates to an object tracking system using an imaging unit, or in particular to an object tracking method and an object tracking apparatus for automatically detecting an object within a monitor area by processing an image signal and tracking the object by controlling the universal head of the camera carrying the imaging unit.
An object tracking system such as a remote monitor system using an imaging unit (hereinafter referred to as “the camera”) such as a TV camera has been widely used. Many of them are what is called a manned monitor system which is operated by a monitor person watching the image displayed on the monitor. In the manned monitor system, however, a monitor person is required to keep watching the image displayed on the monitor and identify in real time an object such as a man or an automotive vehicle which may intrude into the monitor area. This is a great burden on the monitor person.
The concentration power of the man is limited. In the manned monitor system, therefore, an intruding object may be unavoidably overlooked, thereby posing the problem of low reliability. Also, in the case where an intruding object is found in the image (camera image) picked up by the camera, the camera universal head (electrically operated swivel base) carrying the camera is required to be operated in such a manner as to catch the intruding object within the camera view filed (i.e. the imaging view field). Partly due to the recent explosive extension of ownership of the monitor camera, however, one monitor person often watches a multiplicity of camera images on a plurality of monitors. In the case where a plurality of cameras catch an intruding object at the same point time, the universal heads of the plurality of the cameras cannot be easily operated at the same time. In such a case, the intruding object is liable to be overlooked.
Demand is high, therefore, for what is called a monitor system of automatic detection and tracking type in which an intruding object is detected not by a monitor person but automatically by processing the image picked up by the camera (camera image), and controlling the universal head of the camera to catch the intruding object in the camera view field as required thereby to make a predetermined announcement or alarm.
As a function to realize the monitor system described above, the camera universal head is required to be controlled in such a manner that an object to be monitored and considered an intruding object is detected from an image signal by a predetermined monitor method and caught within the camera view field.
In the conventionally implemented monitor system of this type, an intruding object in the view field is detected by the difference method. In the difference method, an image (input image) picked up by the imaging unit such as a camera is compared with an image containing no image of the object to be detected (i.e. a reference background image prepared in advance), the difference of the brightness value is determined for each pixel or each pixel block including a plurality of pixels, and an area having a large difference (image signal change area) is detected as an intruding object.
The image of the intruding object detected in this way is registered as a template, and the motion of the intruding object within the camera view field is detected thereby to control the camera universal head in accordance with the motion of the object.
The process for this conventional method of detecting an intruding object is explained with reference to the flowchart of FIG. 5. In FIG. 5, the process starts with the initialization step 101 in which external devices, variables and an image memory for executing the intruding object tracking method are initialized. Next, in step 102 defined by dotted line, an intruding object is detected by the difference method. Step 102 includes a first image input step 102a for acquiring an input image having 320 row pixels and 240 column pixels, for example, from the camera. Next, in the difference processing step 102b, the brightness value difference for each pixel between the input image acquired in the first image input step 102a and the reference background image prepared in advance is calculated as a brightness value of the difference image. Then, in the binarization step 102c, a binary image is acquired from the pixel value of the difference image (difference value) obtained in the difference processing step 102b in such a manner that the value of each pixel less than a predetermined threshold value Th (say, 20) is regarded as “0” and the pixel value not less than the threshold value Th as “255” (calculated with each pixel as 8 bits in pixel value). Next, in the labeling step 102d, a cluster of pixels having the value “255” is detected in the binary image obtained in the binarization step 102c, and each pixel is discriminated by a number attached thereto. In the intruding object presence determining step 102e, it is determined that an intruding object exists in the monitor area in the case where the cluster of the pixels having the value “255” numbered in the labeling step 102d meets predetermined conditions (for example, a size having 20 row pixels and 50 column pixels) as described above with reference to the difference method.
In the case where it is determined that an intruding object exists in the intruding object determining step 102e, the process branches to the first alarm/alarm display step 103. In the case where it is determined that there is no intruding object, on the other hand, the process branches to the first image input step 102a again thereby to execute the difference method.
Next, in the first alarm/alarm display step 103, the monitor person is informed, for example, by sounding an alarm indicating that an intruding object has been found or displaying an alarm indicating that an intruding object has been found on the monitor. Then, in the template registration step 104, an image of the intruding object is cut out from the input image based on the cluster of the pixels having the pixel value “255” numbered in the labeling step 102d, and registered as a template.
Next, the position is detected where the degree of coincidence (likeliness) becomes maximum between an image sequentially input and the template thereby to detect the position of the intruding object. This method is widely known as the template matching, and is described in detail, for example, in “Digital Picture Processing” published by ACADEMIC PRESS pp. 296-303, 1976 and U.S. Pat. No. 5,554,983, the disclosure of which is hereby incorporated by reference herein.
The template matching is used by reason of the fact that the execution of the difference method requires the reference background image 602, and in the case where the camera universal head is controlled in such a manner as to catch an intruding object within the camera view field, the optical axis of the camera deviates undesirably, thereby making it impossible to use the reference background image 602 prepared in advance.
Normally, in the case where the position of an intruding object is detected using the template matching, the position change of the intruding object is followed in such a manner that the image at the position of the intruding object detected by template matching is sequentially updated as a new template. This process is executed in and subsequent to the second image input step 105, and explained below.
In the second image input step 105, like in the first image input step 102a, an input image having 320 row pixels and 240 column pixels, for example, is acquired from the camera. Next, in the template matching step 106, an image having the highest degree of coincidence with the template is detected from the input image acquired in the second image input step 105. Normally, the job of comparing the whole input image with the template requires a long calculation time. Therefore, a predetermined range with respect to the template (for example, a range extended 20 column pixels and 50 row pixels with respect to the template) is set as a search area, and an image highest in the degree of coincidence with the template is detected within this search area.
The degree of coincidence can be calculated using the normalized cross-correlation value r(Δx, Δy) described in U.S. patent Ser. No. 09,592,996.
The normalized cross-correlation value r(Δx, Δy) is included in the range −1≦r(Δx, Δy)≦1, and in the case where the input image and the template coincide completely with each other, the value of “1” is assumed. The template matching is a process in which Δx and Δy are scanned within the search range, that is to say, changed in the range −50≦Δx≦50 and −20≦Δy≦20, respectively, thereby to detect the position associated with the maximum normalized cross-correlation value r(Δx, Δy).
Next, in the coincidence degree determining step 107, the degree of coincidence r(Δx, Δy) is determined. In the case where the normalized cross-correlation value is used, and not less than 0.7, for example, it is determined that the degree of coincidence is high, and the process branches to the intruding object position correcting step 108. In the case where the normalized cross-correlation value is less than 0.7, on the other hand, it is determined that the degree of coincidence is low, and the process branches to the first image input step 102a. 
A high degree of coincidence is indicative of the presence of an image similar to the template in the input image, i.e. the presence of an intruding object in the input image. In this case, the intruding object continues to be tracked.
A low degree of coincidence, on the other hand, indicates the absence of an image similar to the template in the input image, i.e. the absence of an intruding object in the input image. In this case, the process branches to the first input image input step 102a, and the process for detecting an intruding object is executed again by the difference method.
In the intruding object position correcting step 108 executed in the case where the degree of coincidence is high, the value (x0+Δx, y0+Δy) is corrected as a new position of the intruding object based on the position (Δx, Δy) associated with the maximum degree of coincidence. Next, in the template update step 117, the input image obtained in the second image input step 105 is cut out as a new template image based on the newly determined position of the intruding object.
Further, in the camera universal head control step 118, the camera universal head (i.e. the direction of the optical axis of the camera) is controlled according to the displacement between the position of the intruding object detected in the template matching step 106 and a predetermined reference position of the input image (i.e. a predetermined reference position in the imaging view field) such as the center of the input image. As an example, assume that an intruding object is detected at a position 802 shown in FIG. 6. Assuming that the center position of the intruding object coincides with the center 803 of the template, the displacement dx, dy from the center of the image is calculated.
In the case where the template center position 803 is located leftward of the center 804 of the input image by at least a predetermined amount s (dx<−s), the camera universal head is panned leftward. In the case where the template center position 803 is located rightward of the center 804 of the input image by at least a predetermined amount s (dx>s), on the other hand, the camera universal head is panned rightward. In the case where the intruding object is located at about the center of the image (−s≦dx≦s), the camera universal head is not required to be controlled. Therefore, the position where the camera universal head begins to be controlled can be designated by the predetermined amount s. The predetermined amount s is 50, for example.
Also, in the case where the center position 803 of the template is higher than the center 804 of the input image (dy<−s), the camera universal head is tilted upward, while in the case where the center position 803 of the template is lower than the center 804 of the input image (dy>s), the camera universal head is tilted downward.
As an alternative, the control speed of the pan motor and the tilt motor may be changed according to the absolute value of dx and dy (the control speed is higher, the larger the absolute value of dx or dy).
Finally, in the second alarm/alarm display step 119, an alarm is sounded, for example, to inform the monitor person that an intruding object is being tracked. Alternatively, an alarm indicating that an intruding object is being tracked may be displayed on the monitor.
The method of tracking an intruding object using the template matching described above poses the problem that in the case where the direction of the intruding object to be tracked changes (in the case where the intruding person turns his head to the right or looks back, for example), the displacement between the intruding object and the position detected by the template matching is increased and the accurate and stable tracking becomes impossible.
This is due to the characteristic of the template matching in which a pattern having a high contrast in the template (the part of a large brightness value difference) is matched. Assume that an intruding object is an automotive vehicle which is first directed forward and substantially wholly held in the template. Once the vehicle changes its running direction sideways, for example, other parts than the front part of the vehicle can no longer be held in the template. Unlike during the time when the whole vehicle is held in the template, the center of the template moves to the front part from the center of the vehicle, and therefore the detection position of the intruding object is displaced.
This phenomenon is explained with reference to FIG. 7. In order to simplify the explanation, this figure shows an example where the camera universal head is not controlled. To explain the phenomenon in which an intruding object becomes impossible to hold in the template by the template matching method of tracking the intruding object, the process executed in the case where a vehicle running along a curved road in the imaging view field is tracked as an intruding object is shown in FIG. 7.
Reference numerals 901, 903, 905 and 907 designate template images at time points t1−1, t1, t1+1 and t1+2, respectively, and numerals 901a, 903a, 905a and 907a the template at time points t1−1, t1, t1+1 and t1+2, respectively. Numerals 902, 904, 906 and 908 designate input images at time points t1, t1+1, t1+2 and t1+3, respectively, and numerals 902a, 904a, 906a and 908a the template positions at time points t1−1, t1, t1+1 and t1+2, respectively (the positions of the intruding object at time points t1−1, t1, t1+1 and t1+2, respectively). Also, numerals 902b, 904b, 906b and 908b designate the positions of the intruding object detected by template matching at time points t1, t1+1, t1+2 and t1+3, respectively.