Change detection between a pair of digital images is used for automatic detection of new blobs appearing in a given scene. A blob is a connected component that has visible local saliency. An object includes one or more blobs. Therefore, in the context of the present invention blob extraction includes extraction of parts or the entire object. In other words, the term object extraction is also covered by the term “blob extraction” as used herein.
The problem of change detection that is addressed herein is defined as follows: a pair of registered gray-level images of the same scene, with different illumination, where each image may contain blobs that are not contained in the other, is given as input. The set of all blobs that do not exist in both images is defined as the change. Prior knowledge about the image, its statistics, and the changes is not given. A typical example is demonstrated in FIG. 1. FIG. 1 shows a pair of images used as inputs to the change detection method. Each image contains a blob that is not contained in the other (e.g. a vehicle 102 in (a) and 104 in (b)). Notice that the images contain the same scenery, under different illumination. If the same blob exists in both images in different locations, it is regarded as a change, since each image contains a blob that is not included in the other. One constraint of the problem is that one would like to locate even small blobs that are composed of ca. 30 pixels.
It is important to emphasize that since non-constant (i.e. different) illumination is assumed between images, methods that are based on image subtraction will yield many false alarms. More sophisticated methods that are based on gray-level surface approximation may fail to detect changes that are caused because of small blobs, since surface approximations tend to smooth the boundaries of the blobs and reduce the ability to detect small blobs
1. Prior Art Methods
Most of the prior art dealing with the problem of identification of change detection does not address the situation of existing non-constant (different) illumination between two images. Many of the prior art works focus on situations of change detection in video. Usually, the illumination in the two images is assumed to be identical. When the change detection in moving video is motivated by compression, the goal is to detect areas of change, and not blobs of change. An area of a change can include parts of a complete blob.
The known techniques for change detection can be classified into the following categories:
1.1Pixel-level change detection.1.2Surface modeling.1.3Comparison among derivative images.1.4Contrast invariant representation.1.5Region based comparison of first or higher order statistics.
All these methods are region based, in contrast with the method of the present invention, which is blob based. The regions are independent of the image content. It is important to note that none of the reviewed prior art methods use segmentation, since exact image segmentation of noisy scenes is very complicated, and it is still considered a difficult problem. Many of the reviewed prior art methods use polynomial approximation on the image surface. This is another drawback, since this approximation smoothes the image, which leads to less accurate results than a blob-based approach.
1.1. Pixel-Level Change Detection
Change between two images or frames can be detected by comparing the differences in intensity values of corresponding pixels in the two frames. An algorithm counts the number of the changed pixels, and a camera break is declared if the percentage of the total number of pixels changed exceeds a certain threshold [R. Kasturi and R. Jain, “Dynamic Vision”, Computer Vision: Principles, Eds. R. Kasturi, R. Jain, IEEE Computer Society Press, Washington, pp. 469–480, 1991 (hereinafter KAS91); A. Nagasaka, and Y. Tanaka, “Automatic Video Indexing and Full-Video Search for Blob Appearances”, Visual Database Systems, II, Eds. E. Knuth, and L. M. Wegner, Elsevier Science Publishers B. V., IFIP, pp. 113–127, 1992 (hereinafter NAG92); H. J. Zhang, A. Kankanhalli, and S. W. Smoliar, “Automatic Partitioning of Full-Motion Video”, ACM/Springer Multimedia Systems, Vol. 1, No. 1, pp. 10–28, 1993 (hereinafter ZHA93)].
Mathematically, the difference in pixels and the threshold calculation can be represented by Eqs. 1 and 2.
                                          DP            i                    ⁡                      (                          x              ,              y                        )                          =                  {                                                                                          1                                                                                        0                                                              ⁢                                                          ⁢              if              ⁢                                                          ⁢                                                                                                          F                      i                                        ⁡                                          (                                              x                        ,                        y                                            )                                                        -                                                            F                                              i                        +                        1                                                              ⁡                                          (                                              x                        ,                        y                                            )                                                                                                            ⁢                          >              t                        ⁢                                                  ⁢            otherwise                                              (        1        )                                                                                                      ∑                                      x                    ,                                          y                      =                      1                                                                                        X                  ,                  Y                                            ⁢                                                DP                  i                                ⁡                                  (                                      x                    ,                    y                                    )                                                                    X              ×              Y                                *          100                ≻        T                            (        2        )            In Eq. 1, Fi(x,y) is the intensity value of the pixel in frame i at the coordinates (x,y). If the difference between the corresponding pixels in the two consecutive frames is above a certain minimum intensity value, then DPi(x,y), the difference picture, is set to one. In Eq. 2, the percentage difference between the pixels in the two frames is calculated by summing the difference picture and dividing by the total number of pixels in a frame. If this percentage is above a certain threshold T, a camera break is declared.
Camera movement, e.g., pan or zoom, can have an effect of a large number of pixel changes, and hence a segment will be detected. Fast moving blobs also have the same effect. If the mean intensity values of the pixels and their connected pixels are compared [ZHA93], then the effects of the camera and blob motion are reduced.
1.2. Surface Modeling
Here the idea is to model the gray-level surface of a pair of images such that the surface of the errors between the images is negligible. Hsu Y. Z., Nagel H. H, and Rekers G., “New likelihood test methods for change detection in image sequences”, Computer Vision Graphics Image Processing, vol. 26, pp. 73–106, 1984 (hereinafter HSU84)] model the gray-level surface by patches of a second order bivariate polynomial in the pixel coordinates. Given two corresponding regions, R1(x0,y0) in the image I(1) and R2(x0,y0) in I(2), they represent each region by a set of seven parameters—the six coefficients of the quadratic polynomial patch, and the sum of square differences between the polynomial patch and the gray-levels. Under the assumption that the approximating patch represents the gray-level surface up to uncorrected noise errors, a likelihood test to the two hypotheses is made:
H0: R1(x0,y0) and R2(x0,y0) come from the same gray-value distribution.
H0: R1(x0,y0) and R2(x0,y0) come from different gray-value distributions.
This method is not adequate to handle changes in illumination in the pair of images, as shown in Skifstad Kurt and Jain Ramesh, “Illumination Independent Change Detection for Real World Image Sequences”, Computer Vision, Graphics, and Image Processing, Vol. 46, pp. 387–399, 1989 (hereinafter SKI89).1.3. Comparison among Derivative Images
These methods are based on the derivative images instead of working on the original gray-level images. A version of this concept is used by [SKI89]. They partition the image into regions, and each surface in each region is approximated by polynomials. Then, the derivatives of each patch are computed. If the images of the derivatives are denoted by I(D1) and I(D2), then a threshold is used in order to create a binary image from the image of differences, I(D1)−I(D2). Areas of change are supposed to be white regions in this binary image. This method is inadequate for noisy inputs.
1.4. Contrast—Invariant Representation
Another method that can be used to perform change detection is described in P. Monasse, F. Guichard, “Fast Computation of a Contrast-Invariant Image Representation”, IEEE Trans. on Image Processing, Vol. 9, No. 5, 860–872, 2000 (hereinafter MON00). This paper sets out a new representation of an image, which is contrast-independent. The image is decomposed into a tree of “shapes” based on connected components of level sets, which provides a full and un-redundant representation of the image. This method, which is based on invariance under change of contrast, can be used to perform some kind of change detection between images that have different illumination. However, the formulation of the solution using a level set method cannot handle efficiently many blobs at the same time.
1.5. Region Based Comparison of First or Higher Order Statistics
The input images are divided into regions, usually squares of m×m pixels. Denote by R1(x0,y0) the square in the image I(1) that its center is the pixel with coordinates (x0,y0), and similarly denote by R2(x0,y0) the corresponding square in the image I(2). The gray-levels in the region R1(x0,y0) are normalized such that the mean gray-level and the variance of the gray-levels of R1(x0,y0) are the same as the mean and variance gray-level of R2(x0,y0). Then, the image I(2) is compare to the image I(1). The normalization process of this statistical method is supposed to be a variation of illumination correction.
The shading model method was suggested by SKI89. Each gray-level is basically the product of two components: (1) the amount of source light incident on the scene and (2) the amount of light reflected by the blobs in the scene. The amount of source light incident on a small region of the scene is approximately uniform, but the reflected light of two adjacent blobs may be different. Denote by i(x0,y0) the amount of source light incident on point (x0,y0) in the scene, and by r(x0,y0) the amount of reflected light from the point (x0,y0) in the scene. Let I(1) and I(2) be two images with corresponding functions, i1(x,y), i2(x,y), r1(x,y) and r2(x,y). If at pixel (x0,y0) both images contain the same blob, then the following is satisfied:
                                                                        I                                  (                  1                  )                                            ⁡                              (                                                      x                    0                                    ,                                      y                    0                                                  )                                                                    I                                  (                  2                  )                                            ⁡                              (                                                      x                    0                                    ,                                      y                    0                                                  )                                              ≈                                                                      i                  1                                ⁡                                  (                                                            x                      0                                        ,                                          y                      0                                                        )                                            ·                                                r                  1                                ⁡                                  (                                    )                                                                                                      i                  2                                ⁡                                  (                                                            x                      0                                        ,                                          y                      0                                                        )                                            ·                                                r                  2                                ⁡                                  (                                    )                                                                    =                                            i              1                        ⁡                          (                                                x                  0                                ,                                  y                  0                                            )                                                          i              2                        ⁡                          (                                                x                  0                                ,                                  y                  0                                            )                                                          (        3        )            since the amount of reflected light from point (x0,y0) depends on the blob itself. Let F be the image of real numbers that is the result of the division of the two images, I(1) and I(2), that is:
                              F          Def                =                              I                          (              1              )                                            I                          (              2              )                                                          (        4        )            where F is assumed to have accuracy of real numbers. Let RF(x0,y0) be a small neighborhood around the point (x0,y0) in the image F. Then, for a point (x0,y0) that belongs to the same blob in both images, I(1) and I(2), the surface patch that is composed of the values in the region RF(x0,y0) is expected to be a smooth and slow varying surface, since the change of the illumination in a small region is slow-varying. On the other hand, for a pixel (x0,y0) that belongs to a different blob in each image, the surface patch that is composed of the values in RF(x0,y0) is expected to be much less smooth, since the region RF(x0,y0) can include a transition from one blob to another. The method in [SKI89] proposes to examine the variance in each pixel (x0,y0) of the region RF(x0,y0). If the variance is higher than some pre-specified threshold, then the pixel is considered as belonging to a region of change. The change detection mask of this method is defined for each pixel by the following formula:
                              E          ⁢                      {                          σ              2                        }                          =                              E            ⁢                          {                                                1                  N                                ⁢                                                      ∑                                          xεA                      i                                                        ⁢                                                            (                                                                                                    I                            x1                                                                                I                            x2                                                                          -                                                  μ                          i                                                                    )                                        2                                                              }                                =                                                    1                N                            ⁢                                                ∑                                      xεA                    i                                                  ⁢                                                      (                                                                  E                        ⁢                                                  {                                                                                    I                              x1                                                                                      I                              x2                                                                                }                                                                    -                                              E                        ⁢                                                  {                                                      μ                            i                                                    }                                                                                      )                                    2                                                      >            0                                              (        5        )            where μi is the average value of the ratio of intensities, E is the expectation, N is the size of the image, and “A” is a 5×5 region. Among all the other reviewed methods, this method, and the statistical method that will be introduced in the next paragraph, are the only ones that address directly the problem of change in illumination. This method is based on the assumption that the division of the images cancels the difference in the illumination between the two images, which does not always hold in practice. Moreover, the variance inside a region RF(x0,y0), whose size is not based on the image content, adds inaccuracies of its own.
Another method of statistical model-based change detection was proposed by Til Aach, Andre Kaup and Rudolf Mester, “Statistical model-based change detection in moving video”, Signal Processing, Vol. 31, pp. 165–180, 1993 (hereinafter AKM93). Given two successive frames I(k) and I(k+1), letdk=(x, y)=Ik+1(x, y)+Ik(x, y)  (6)denote the image of gray level differences between frames I(k) and I(k+1). Under the
  p  (                    d        k            ⁡              (                  x          ,          y                )              ⁢                                  H          0                =                              1                                          2                ⁢                                                                  ⁢                                  ∏                                      σ                    2                                                                                ⁢                      e                          {                              -                                                      d                    k                    2                                                        2                    ⁢                                          σ                      2                                                                                  }                                          hypothesis than no changes occurred at position (x,y) (the null hypothesis H0), the corresponding difference dk(x,y) follows a zero-mean Gaussian distributionwhere the noise variance σ2is equal to twice the variance of the camera noise, assuming that the camera noise is white. Rather than performing the significance test on the values dk(x,y), it is better to evaluate a local sum of normalized differences:
                                          Δ            k                    ⁡                      (                          x              ,              y                        )                          =                              ∑                                          (                                                      x                    ′                                    ,                                      y                    ′                                                  )                            ⁢                              εw                ⁡                                  (                                      x                    ,                    y                                    )                                                              ⁢                                                    d                k                2                            ⁡                              (                                                      x                    ′                                    ,                                      y                    ′                                                  )                                                    σ              2                                                          (        8        )            where w(x,y) is a window of observation centered at (x,y). Under the assumption that no changes occur within the windows, the normalized differences dk/σ obey a Gaussian distribution N(0,1) and are spatially uncorrelated. Thus, the local sum Δk(x,y) follows a x2 distribution with N degrees of freedom, N being the number of pixels within the windows w(x,y). With the distribution p(Δk(x,y)) known, a decision rule for each pixel can obtained by a significance test on Δk(x,y). For a specific level a one can compute a corresponding threshold Tα using:
                              α          =                                    P                              r                ⁢                                                                                        ⁢                          {                                                          ⁢                                                                    Δ                    k                                    ⁡                                      (                                          x                      ,                      y                                        )                                                  >                                                      T                    α                                    ⁢                                                                                H                      0                                                                                                          }                            (        9        )            The significance level α is in fact the false alarm rate associate with the statistical test. The higher the value of α, the more likely is the classification of unchanged pixels as change. It is obvious that the significance test depends on the noise variance α2. Thus, an accurate estimate of the noise variance is crucial for performance of the test. To ensure that, the variance is estimated only within the background region of the current frame, to remove the influence change region. The background regions are determined according to the tracked mask of the previous frame. One of the problems of this concept is the initial step when the background regions are not yet known: it requires a heuristics method that is strongly based on a threshold for estimating the background region.
The likelihood ratio approach is suggested based on the assumption of uniform second-order statistics over a region [KAS91; N. H. Nagel, “Formulation of a blob concept by analysis of systematic time variation in the optically perceptible environment”, Computer Graphics and Image Processing, Vol. 7. pp. 149–194, 1978 (hereinafter NAG78); ZHA93]. The frames can be subdivided into blocks, and then the blocks are compared on the basis of the statistical characteristics of their intensity levels. Eq. (10) represents the formula that calculates the likelihood function. Let μi and μi+1 be the mean intensity values for a given region in two consecutive frames, and σi and σi+1 be the corresponding variances. The number of the blocks that exceed a certain threshold t are counted. If the number of blocks exceeds a certain value (dependent on the number of blocks), a segment is declared. A subset of the blocks can be used to detect the difference between the images so as to expedite the process of block matching.
                              λ          =                                                    (                                  [                                                                                                              σ                          i                                                +                                                  σ                                                      i                            +                            1                                                                                              2                                        +                                                                  (                                                                                                            μ                              i                                                        -                                                          μ                                                              i                                +                                1                                                                                                              2                                                )                                            2                                                        ]                                )                            2                                                      σ                i                            ×                              σ                                  i                  +                  1                                                                    ⁢                                  ⁢                              Dp            i                    ⁡                      (                          k              ,              l                        )                          =                  {                                                                                          1                                                                                        0                                                              ⁢                                                          ⁢                                                                    if                                                                                        otherwise                                                              ⁢                                                          ⁢              λ                        >                          t              ⁢                                                          ⁢              If              ⁢                                                          ⁢                                                                                          ∑                                              X                        ,                        Y                                                                                    x                      ,                                              y                        =                        1                                                                              ⁢                                                            DP                      i                                        ⁡                                          (                                              x                        ,                        y                                            )                                                                                        X                  ×                  Y                                            *              100                        >            T                                              (        10        )            a camera break is declared. This approach increases the tolerance against noise associated with camera and blob movement. It is possible that even though the two corresponding blocks are different, they can have the same density function. In such cases no change is detected.
Another method based on statistical computations was suggested by Sze-Chu Liu, Chang-Wu Fu, and Shyang Chang, “Statistical change detection with moments under time-varying illumination”, IEEE Transactions On Image Processing, Vol. 7, No. 9, September 1998 (hereinafter SZE98). The proposed method consists of two parts. First, based on the defined circular shift moments, this method tries to distinguish the structural changes from those caused by illumination in a noise-free case, which is mentioned by [SKI89]. Moreover, the amount of computation in calculating higher-order circular shift moments can be reduced via a set of iterative formulae. Therefore, the time required for the computation is less than that of the shading model [SKI89]. Second, in accordance with the characteristics of the defined moments, SZE98 also propose a statistical decision rule to cope with the effects of noise. The change detection problem can be treated as one of hypothesis testing. Critical values are determined according to the desired level of significance. This does not perform change detection well, and there are many “false alarms”.
There is thus a widely recognized need for, and it would be highly advantageous to have, a fast and robust illumination-insensitive method for extracting blobs that appear in only one of a pair of images, a method that has a low time complexity, and is exact, robust and fast.