The present invention relates to a monitor apparatus using an image pickup unit, or in particular to an object tracking method for automatically detecting an object intruding into an imaging field or image pickup field from a video signal inputted from the image pickup unit and automatically tracking the motion of the detected object and an object tracking apparatus for automatically adjusting the imaging direction (central direction of an image) in accordance with the detected motion of the object.
A video monitor apparatus using an image pickup unit such as a television camera (referred to as a TV camera) has been widely used. As one of the monitor systems using such video monitor apparatuses, a manned monitoring system may be referred in which system an intruding object such as a man or an automotive vehicle entering the monitor field is detected or tracked by a human monitor while watching the image displayed on the monitor. However, apart from such a manned monitoring system, an automatic monitor system using a video monitor apparatus has been in demand, in which an intruding object is automatically detected from the image inputted from an image input unit such as a camera, the motion of the object is automatically tracked and a predetermined announcement or alarm action can be taken.
For realizing such an automatic monitor system, the first step is to detect an intruding object in the view field by a so-called subtraction method or the like. The subtraction method is executed to take the steps of comparing the input image obtained by an image pickup unit with a reference background image prepared in advance (i.e. an image not including the object to be detected), determine the brightness (or intensity) difference for each pixel, and detect an area with a large difference value as an object. The part of the input image (referred to as a partial image) corresponding to the position of the intruding object detected in this way is registered as a template, so that a position associated with the maximum degree of coincidence with the template image is detected in the sequentially inputted images. This method is widely known as the template matching, and is described in detail, for example, in the U.S. Pat. No. 6,208,033.
Ordinarily, in the case of tracking an object (for example, an object detected by the subtraction method) using the template matching, the partial image at the position of the object detected by the matching process is sequentially updated as a new template image because the template matching follows the change of the posture of the object. This process will be now described with reference to FIGS. 1 to 3.
FIG. 1 is a diagram useful for explaining the flow of the process of detecting an object intruded into the view field by the subtraction method and registering the detected object as an initial template image so that it may be used for the template matching.
In FIG. 1, numeral S01 designates an input image, numeral S02 a reference background image, numeral S03 a difference image between the input image S01 and the reference background image S02, numeral S04 a binarized image of the difference image S03, numeral S05 a subtraction processing unit, numeral S06 a binarization processing unit (Th), numeral S07 a man-like object in the input image S01, numeral S08 a man-like difference image in the difference image S03 corresponding to the man-like object S07, and numeral S09 a man-like object (man-like binarized image) in the binarized image S04 corresponding to the man-like difference image S08. Further, numeral S10 designates a circumscribed rectangle of the detected man-like object S09, numeral S11 an extraction processing unit (CL) of extracting a specified area from the input image S01, numeral S12 an extracted image constructed of the extracted area (template image), and numeral S13 a template image.
In FIG. 1, first, the input image S01 of e.g. 320×240 pixels is inputted from a camera E01 into an intruding object monitor apparatus E05. Then, in the subtraction processing unit S05, the brightness difference between the input image S01 and the reference background image S02 prepared in advance is calculated for each pixel thereby to acquire the difference image S03 in which the calculated difference of each pixel is assumed to be the brightness value thereof. At the same time, the man-like object S07 in the input image S01 appears in the difference image S03 as a man-like difference image S08.
Next, in the binarization processing unit S06, the brightness value of each pixel of the difference image S03 having the difference value less than a predetermined threshold (for example, 20) is set to “0”, while the brightness value of each pixel not less than the threshold is set to “255” (assuming that one pixel includes 8 bits in this specification) thereby to obtain the binarized image S04.
In the process, the man-like object S07 picked up in the input image S01 is detected as a man-like object S09 in the binarized image S04 (Sketch of the object detecting process using the subtraction method).
In FIG. 1, furthermore, the circumscribed rectangle S10 of the man-like object S09 detected by the subtraction method is detected. Then, in the extraction processing unit S11, an area represented by the circumscribed rectangle S10 is extracted from the input image S01. The extracted image is registered as the template image S13 in the extracted image S12 (Sketch of the process of registering the initial template image).
FIGS. 2A and 2B are diagrams useful for explaining in an prior art the flow of the process of detecting where of the input image the intruding object registered in the template image using the template matching is located. In FIG. 2A, numeral M01 designates the extracted image obtained by the subtraction method (corresponding with the extracted image S12 in FIG. 1). In FIG. 2B, numeral M02 designates an input image obtained by the image pickup unit E01 and numeral M03 a template image.
Numeral M05 designates one of the dotted areas in the input image M02 and represents the position of the template image M03. Numeral M06 designates one of the dotted areas in the input image M02 and represents a search area of the template matching. Further, dx and dy represent the widths of the search area (in which dx represents the horizontal direction and dy the vertical direction). The widths dx and dy are set according to the amount of the apparent motion (the motion on the image) of the object to be tracked. For example, the widths dx and dy may be set as dx=50 pix and dy=15 pix.
The template matching is a process of searching the portion of the maximum degree of coincidence with the template image M03 in the search area M06 of the input image M02. As this coincidence may be used an index called the normalized correlation obtained from the following equation (1).
                              γ          ⁡                      (                          x              ,              y              ,              u              ,              v                        )                          =                                            ∑                              i                =                0                                            W                -                1                                      ⁢                                                  ⁢                                          ∑                                  j                  =                  0                                                  H                  -                  1                                            ⁢                                                          ⁢                                                {                                                            f                      ⁡                                              (                                                                              x                            +                            i                                                    ,                                                      y                            +                            j                                                                          )                                                              -                                                                  f                        ⁡                                                  (                                                      x                            ,                            y                                                    )                                                                    _                                                        }                                ⁢                                  {                                                            g                      ⁡                                              (                                                                              u                            +                            i                                                    ,                                                      v                            +                            j                                                                          )                                                              -                                                                  g                        ⁡                                                  (                                                      u                            ,                            v                                                    )                                                                    _                                                        }                                                                                                                                                                    ∑                                              i                        =                        0                                                                    W                        -                        1                                                              ⁢                                                                                  ⁢                                                                  ∑                                                  j                          =                          0                                                                          H                          -                          1                                                                    ⁢                                                                                          ⁢                                              {                                                                              f                            ⁡                                                          (                                                                                                x                                  +                                  i                                                                ,                                                                  y                                  +                                  j                                                                                            )                                                                                -                                                                                    f                              ⁡                                                              (                                                                  x                                  ,                                  y                                                                )                                                                                      _                                                                                                                                              }                            2                        ⁢                                                                                ∑                                          i                      =                      0                                                              W                      -                      1                                                        ⁢                                                                          ⁢                                                            ∑                                              i                        =                        0                                                                    H                        -                        1                                                              ⁢                                                                                  ⁢                                          {                                                                        g                          ⁡                                                      (                                                                                          u                                +                                i                                                            ,                                                              v                                +                                j                                                                                      )                                                                          -                                                  g                          ⁡                                                      (                                                                                          u                                +                                i                                                            ,                                                              v                                +                                j                                                                                      )                                                                          -                                                                              g                            ⁡                                                          (                                                              u                                ,                                v                                                            )                                                                                _                                                                    }                                                                                  2                                                          (        1        )            
In the equation (1), f( ) designates the input image, go the template image, (x, y) the coordinates in the search area M06 of the input image (called the matching area), and (u, v) the coordinates in the upper left (uppermost left portion) of the template image M03, in which in all figures the origin (0, 0) is located in the upper left (uppermost left) portion of the image. Further, W designates a width of the template image (horizontal length), and H a height of the template image (vertical length). Moreover, in the equation (1), {overscore (f( ))} and {overscore (g( ))} represent the average brightness value of the input image and the average brightness value of the template image, respectively, which are represented by the equations (2) and (3).
                                          f            ⁡                          (                              x                ,                y                            )                                _                =                              1            WH                    ⁢                                    ∑                              i                =                0                                            W                -                1                                      ⁢                                                  ⁢                                          ∑                                  j                  =                  0                                                  H                  -                  1                                            ⁢                                                          ⁢                              f                ⁡                                  (                                                            x                      +                      i                                        ,                                          y                      +                      j                                                        )                                                                                        (        2        )                                                      g            ⁡                          (                              u                ,                v                            )                                _                =                              1            WH                    ⁢                                    ∑                              i                =                0                                            W                -                1                                      ⁢                                                  ⁢                                          ∑                                  j                  =                  0                                                  H                  -                  1                                            ⁢                                                          ⁢                              g                ⁡                                  (                                                            u                      +                      i                                        ,                                          v                      +                      j                                                        )                                                                                        (        3        )            
The normalized correlation r(x, y, u, v) represents the degree of coincidence between the brightness value distribution of the area with the width W and the height H, in which the position (x, y) of the input image f(x, y) is the upper left (uppermost left) coordinates (for example, M05 in FIG. 2B), and the brightness value distribution of the area with the width W and the height H, in which the position (u, v) of the template image g(u, v) is the upper left (uppermost left) coordinates (for example, M03 in FIG. 2A). In a case where the brightness value of each pixel of the input image f( ) is equal to that of the corresponding each pixel of the template image g( ), the normalized correlation is 1.0. The template matching is a process of detecting the portion of the maximum coincidence with the template image g( ) in the input image f( ).
That is, with the position (u, v) of the template image go as a reference position, as the position (x, y) is being changed in the range of u−dx≦x<u+dx, v−dy≦y<v+dy, the corresponding position (x, y) to the maximum normalized correlation r(x, y, u, v) represented by the equation (1) is searched.
Apart from the normalized correlation, as the degree of coincidence may be used the average absolute value of the difference of the brightness value of each corresponding pixel between the area with the width W and the height H in which the position (x, y) of the input image f( ) is the upper left (uppermost left) coordinates and the area with the width W and the height H in which the position (u, v) of the template image g( ) is the upper left coordinates.
In this instance, in a case where the brightness value of each pixel of the input image f( ) is equal to that of each corresponding pixel of the template image g( ) the average absolute value is made to be zero (0). As the difference of the brightness value of each corresponding pixel between the input image f( ) and the template image go is made larger, the average absolute value is made larger, (which means the degree of coincidence is made lower).
In the instance of FIG. 2B, the man-like object to be tracked is at the position of M04. The template matching process is executed to detect the area M07. Since this area M07 is detected at the position where the man-like object on the template image M03 coincides with the man-like object on the input image M02, the video monitor apparatus determines that the man-like object to be detected is inside the area M07. That is, it is understood that the intruding object is moving from the area M05 to the area M07. In this case, the movement of the intruding object may be represented by an arrow M08 connecting the center of the area M05 with the center of the area M07.
In turn, with reference to FIG. 3, the description will be oriented to the exemplary process of tracking an intruding object within a view field by applying the template matching described with reference to FIG. 2 into the sequentially inputted images. FIG. 3 is a diagram useful for explaining the process of tracking an object by sequentially performing the conventional template matching method.
In FIG. 3, numerals T01a, T02a, T03a and T04a designate extracted images at time points t0−1, t0, t0+1 and t0+2, respectively. Numerals T01c, T02c, T03c and T04c designate template images at time points t0−1, t0, t0+1 and t0+2, respectively. Numerals T01b, T02b, T03b and T04b designate input images at time points t0, t0+1, t0+2 and t0+3, respectively, numeral T05 a template matching processing unit (MT), T06 a template image updating unit (UD). In this figure, the time point t0+n means that the time is apart from the time point t0 by n frames, in which the processing time of one frame is 100 msec, for example.
The matching processing unit T05 compares the template image of the extracted image with the input image, detect the portion of the maximum degree of coincidence with the template image in the input image, and obtain the positions T01e, T02e, T03e and T04e of the intruding object to be tracked at time points t0, t0+1, t0+2 and t0+3 (template matching).
The template image update unit T05 specifies the image of the portion of the maximum degree of coincidence detected by the matching processing unit T04 as the position image of a new intruding object and replaces the extracted image and the template image using the position image, thereby updating the template image.
Then, the description will be oriented to the template matching process and the template image updating process along time points t0−1, t0, t0+1, t0+2 and t0+3 with reference to FIG. 3.
First, the template matching process is executed by using the template image T01c obtained at time point t0−1 and the input image T01b obtained at time point t0. In the first processing frame, the template image T01c is matched to the partial image S13 of the input image S01 corresponding to the position of the circumscribed rectangle of the man-like object S09 detected by the subtraction method.
The template matching processing unit T05 detects the position image T01e of the intruding object T01d by the template matching described with reference to FIGS. 2A and 2B.
The template image update unit T06 updates the extracted image from T01a to T02a by using the input image T01b having a new position image (template) T01e as the extracted image and also updates the template image from T01c to T02c on the basis of the position image T01e of the intruding object. By executing this kind of process at time points t0, t0+1, t0+2 and t0+3 respectively, it is understood that the intruding object within a view field is moved in the sequence indicated in the position images T01e, T02e, T03e and T04e. 