Foreground/background (FG/BG) separation can be used in applications such as video surveillance, human-computer interaction, and panoramic photography, where foreground content has a different motion than the background content. For example, FG/BG separation can improve object detection, object classification, trajectory analysis, and unusual motion detection leading to high level understanding of events represented in a sequence of images (video).
When robust principal component analysis (RPCA) is used for the separation, the RPCA assumes that an observed video signal B∈m×n can be decomposed into a low rank component X∈m×n, and a complementary sparse component S∈m×n. In this case, in is the number of pixels in each frame, i.e. each column of the matrix corresponds to one frame, and n is the number of frames under inspection. Thus, the FG/BG separation can be formulated as an optimization problem for X and S:
                                          (                          X              ,              S                        )                    =                                                                      arg                  ⁢                                                                          ⁢                  min                                                  X                  ,                  S                                            ⁢                                                                  X                                                  *                                      +                          λ              ⁢                                                                  S                                                  1                                                    ,                                  ⁢                              s            .            t            .                                                  ⁢            B                    =                      X            +            S                          ,                            (        1        )            where ∥.∥* is a nuclear norm of a matrix and ∥.∥1 is l1-norm of a vectorization of the matrix, and λ is a regularization parameter. The solution to the RPCA problem involves computing a full or partial singular value decomposition (SVD) at every iteration.
To reduce the complexity, several techniques, such as, Low-Rank Matrix Fitting (LMaFit). have been described using low rank factors and optimize over the factors in order to limit the computational complexity. Factorization of a matrix on the low-rank component represents X=LRT, where L∈m×n, R∈n×r, and r≧rank(X).
The factorization-based RPCA method can be formulated and solved using an augmented Lagrangian alternating direction method (ADM) as follows:
                                          (                          L              ,              R              ,              S              ,              Y                        )                    =                                                    arg                ⁢                                                                  ⁢                min                                            L                ,                R                ,                S                ,                Y                                      ⁢                          (                                                                                                                  1                        2                                            ⁢                                                                                                  L                                                                          F                        2                                                              +                                                                  1                        2                                            ⁢                                                                                                  R                                                                          F                        2                                                              +                                          λ                      ⁢                                                                                                  S                                                                          1                                                              +                                    <                  Y                                ,                                  E                  >                                                            +                                              μ                        2                                                              ⁢                                                                                          E                                                                    F                      2                                                                                  )                                      ,                            (        2        )            where ∥.∥F is a Frobenius norm of a matrix, λ is a regularization parameter, Y is the Lagrange dual variable, μ is an augmented Lagrangian parameter, and E=B−LRT−S. Note that the nuclear norm ∥X∥* in equation (1) is replaced by ½∥L∥F2+½∥R∥F2 in equation (2), where X=LRT, based on the observation that
                                                                      X                                      *                    =                                                    inf                                  L                  ,                  R                                            ⁢                              1                2                            ⁢                                                                  L                                                  F                2                                      +                                          1                2                            ⁢                                                                  R                                                  F                2                                                    ,                                  ⁢                              s            .            t            .                                                  ⁢            X                    =                      LR            T                          ,                            (        3        )            where T is a transpose operator.
FIG. 3 shows pseudocode of algorithm 1 for the iterations used to solve equation (2). Note in step 5, the soft-thresholding operatorSλ/μ(r)=sign(r)max(|r|−λ/μ,0),  (4)wherein
  r  =      B    -          LR      T        +                  1                  μ          ⁢                                                    ⁢      Y      does not impose structure on the sparse component.
In recent years, structured sparsity techniques have been applied to the RPCA methods. Sparse techniques learn over-complete bases to represent data efficiently. In the art, a sparse matrix is a matrix in which most of the elements are zero. By contrast, if most of the elements are nonzero, then the matrix is considered dense. The fraction of zero elements (non-zero elements) in a matrix is called the sparsity (density). This is mainly motivated by the observation that sparse data are often not random located but tend to cluster.
For example, one learning formulation, called dynamic group sparsity (DGS) uses a pruning step in selecting sparse components that favor local clustering. Another approach enforces group sparsity by replacing the l1-norm in equation (1) with a mixed l2,1-norm defined as,∥S∥2,1=Σg=1swg∥Sg∥2,  (5)where Sg is the component corresponding to group g, g=1, . . . , s, and wg's are weights associated to each group. The resulting problem formulation is
                                          (                          X              ,              S                        )                    =                                                                      arg                  ⁢                                                                          ⁢                  min                                                  X                  ,                  S                                            ⁢                                                                  X                                                  *                                      +                          λ              ⁢                                                                  S                                                                    2                  ,                  1                                                                    ,                                  ⁢                              s            .            t            .                                                  ⁢            B                    =                      X            +                          S              .                                                          (        6        )            
Most recent FG/BG separation approaches in the PCA-family are quite effective for image sequences acquired with a stationary camera, and a mostly static background. However, the separation performance degrades for image sequences with a moving camera which may result in apparent motion in the background, even with limited motion jitter. There, a global motion compensation (MC) aligns the images before applying a RPCA-based FG/BG separation method.
With moving camera sequences, the motion in the background no longer satisfies the low-rank assumption. Hence, in order to apply the RPCA, global motion compensation using a homography model can be used in a pre-processing step on the image sequence prior to using the RPCA.
One approach for performing global motion compensation is to compute a homography model for the image sequence. In an 8-parameter homography model h=[h1, h2, . . . , h8]T, the corresponding pixel x1=(x1, y1)T in the current image and x2=(x2, y2)T in its reference image are related according to
                                          x            2                    =                                                    h                1                            +                                                h                  3                                ⁢                                  x                  1                                            +                                                h                  4                                ⁢                                  y                  1                                                                    1              +                                                h                  7                                ⁢                                  x                  1                                            +                                                h                  8                                ⁢                                  y                  1                                                                    ⁢                                  ⁢        and        ⁢                                  ⁢                              y            2                    =                                                                      h                  2                                +                                                      h                    5                                    ⁢                                      x                    1                                                  +                                                      h                    6                                    ⁢                                      y                    1                                                                              1                +                                                      h                    7                                    ⁢                                      x                    1                                                  +                                                      h                    8                                    ⁢                                      y                    1                                                                        .                                              (        7        )            
Given local motion information associating a pixel location (x1, y1) in the current image to its corresponding location (x2, y2) in a reference image, the homography model h can be estimated by least square (LS) fitting: b=Ah, where b is a vector composed by stacking the vectors x2's, and the rows of A corresponding to each x2 is specified as
                    A        =                              (                                                            1                                                  0                                                                      x                    1                                                                                        y                    1                                                                    0                                                  0                                                                                            -                                              x                        1                                                              ⁢                                          x                      2                                                                                                                                  -                                              y                        1                                                              ⁢                                          x                      2                                                                                                                    0                                                  1                                                  0                                                  0                                                                      x                    1                                                                                        y                    1                                                                                                              -                                              x                        1                                                              ⁢                                          y                      2                                                                                                                                  -                                              y                        1                                                              ⁢                                          y                      2                                                                                            )                    .                                    (        8        )            
Image sequences with corresponding depth maps are now common, especially with the rapid growth of depth sensors like Microsoft Kinect™ and the advancement of depth estimation algorithms from stereo images. Jointly using depth and color data produces superior separation results. Also, a depth-enhanced can better deal with illumination changes, shadows, reflections and camouflage.