1. Field of the Invention
This invention relates to a tracking apparatus and a tracking method, and more particularly to a tracking apparatus and a tracking method suitable for use, for example, with a video camera which tracks and images a predetermined imaging object.
2. Description of the Related Art
A video camera system has already been proposed wherein a pan driving mechanism and a tilt driving mechanism are used to cause a video camera to automatically track an imaging object as an object of tracking to effect automatic tracking so that the imaging object may be displayed at a central location of a display screen.
In a video camera system of the type described above, for example, a reference measurement frame is provided at a central location of a display screen (image photographed or imaged by a video camera) as seen in FIG. 19, and imaging is performed so that the imaging object as an object of tracking may be included in the reference measurement frame and the thus imaged image is stored into a memory. Thereafter, the display screen is scanned with a frame (hereinafter referred to suitably as detection measurement frame) same as the reference measurement frame, and data at different positions in the detection measurement frame are compared with the stored data in the reference measurement frame. Then, a position in the detection measurement frame at which data most similar to data in the reference measurement frame such as, for example, coordinates of a point at which the center of gravity of the detection measurement frame is positioned are detected as a position of the imaging object. Then, a displacement amount which is a distance from the position of the imaging object to the position of the reference measurement frame as a position of a reference for pull-in in the display screen such as, for example, coordinates of a point at which the center of gravity of the reference measurement frame is positioned (coordinates of the center of pull-in) are calculated. Further, the video camera is panned and tilted based on the displacement amount so that the imaging object may be pulled in to a central location of the display screen.
The panning and tilting speeds in this instance are determined in the following manner. In particular, where the coordinates of the reference position are represented by (X.sub.center, Y.sub.center) and the coordinates of the position of the imaging object are represented by (x.sub.now, Y.sub.now), the displacement amounts ex and ey in the panning or horizontal direction and the tilting or vertical direction are respectively calculated in accordance with the following expressions: EQU ex=x.sub.center -x.sub.now EQU ey=y.sub.center -y.sub.now (1)
The panning and tilting speeds, speed_pan and speed_tilt are then respectively calculated in accordance with the following expressions: EQU speed_pan=a_pan.times.f(ex) EQU speed_tilt=a_tilt.times.f(ey) (2)
where a_pan and a_tilt are the gains when the pan driving mechanism and the tilt driving mechanism are driven, respectively, and f() is the function for calculation of a speed.
By the way, in such a video camera system as described above, the gains a_pan and a_tilt (each of the gains may be hereinafter referred to suitably as gain a) are fixed values independent of the displacement amounts ex and ey (each of the displacement amounts may be hereinafter referred to suitably as displacement amount e) as seen in FIG. 20. Accordingly, since the speed of panning or tilting is determined based only on the displacement amount e from the expressions (2) above, an imaging object is sometimes missed if it moves, for example, in the proximity of an end of the display screen, that is, in the proximity of the screen frame as seen in FIG. 21, in a direction toward the outside of the display screen at a higher speed than the panning or tilting speed.
When the imaging object is missed in this manner, it is re-detected (prediction tracked) by a method wherein a range in which the imaging object may be present is predicted from a movement of the imaging object before it is missed and panning and/or tilting are performed so that the video camera may image within the predicted range.
However, when a moving imaging object stops, for example, at a position behind a body such as a tree or a pole, if the range in which the imaging object may be present is predicted, using the method described above, from the movement of the imaging object before it is missed, then it is predicted that the imaging object is present ahead of the tree or the pole. Accordingly, it is difficult to find the imaging object using the method described above.
Further, where the gain and the speed of panning or tilting are determined in such a manner as described above, if a displacement amount e is produced, then tracking of the imaging object is started immediately. Accordingly, if the imaging object does not move but merely oscillates, when a displacement amount is produced by the oscillations, panning and/or tilting are performed. Consequently, also an image obtained by imaging exhibits oscillations, and this gives rise to a problem that the image lacks in stability.
Further, where such a video camera system as described above is applied, for example, to a television telephone system or a television conference system so that an image obtained by tracking an imaging object is compression coded and transmitted to a remote location, when the imaging object moves continuously, also the background of the image obtained by tracking the imaging object varies continuously. This gives rise to another problem in that the coding efficiency is deteriorated.
Further, while the video camera system is constructed such that, as described above, the position of the reference measurement frame provided at the central location of the display screen is set as the reference position for pull-in and accordingly panning and tilting are performed so that an imaging object being tracked may be pulled in to the central location of the display screen, depending upon a situation in use, a user may desire to perform tracking so that an imaging object may be pulled in to some other location of the display screen. However, with the video camera system described above, it is difficult to cope with such a case.
Further, when, for example, a certain person is determined as an object of tracking and tracking is started imaging the face of the person in the reference measurement frame, if some other person is present in the neighborhood, then tracking is sometimes performed recognizing the second person as a person to be tracked in error. In such an instance, with the video camera described above, it is required to first perform an operation to interrupt the tracking and then perform manual panning and/or tilting so that the face of the first person may be imaged in the reference measurement frame again. In this manner, the video camera has a still further problem in that complicated operations are required.