In many fields of image processing, such as stereo image processing, tracking, image sensing, remote sensing, and image registration, region-based matching is adopted as a basic process for obtaining a displacement between images.
The region-based matching is characterized in that the size and shape of a focused area can be set freely, the focused area can be offset with respect to focused pixels, and computation is self-explanatory to allow directly implement. When displacement between images is determined in subpixels by region-based matching, a method is generally used in which similarity evaluation values discretely obtained in pixels are used to estimate subpixel displacement.
In order to determine correspondence between images, an evaluation value called similarity is used. The similarity is a function that a relative position relationship between images is set to input and similarity between images at that position relationship is set to output. For the similarity, the value is determined by a discrete position in pixels. Thus, when the correspondence position relationship between images is determined based only on the similarity, it is limited to resolution in pixels. Then, interpolation is implemented based on the similarity value to determine the correspondence between images for subpixel resolution.
Traditionally, in order to expand position resolution, subpixel estimation is often implemented by fitting interpolation of the similarity evaluation value. However, when the translation amount of image is determined in subpixel resolution, an image is fitting interpolated separately in the horizontal direction and in the vertical direction. Therefore, a problem arises that sufficient accuracy cannot be obtained.
In short, traditionally, in region-based matching using a digital image, in order to improve the estimation resolution of displacement, subpixel estimation has been implemented in which the similarity evaluation value is used separately in the horizontal/vertical direction of the image.
Furthermore, traditionally, when a gradient-based method is used to determine displacement between images, the amount of subpixel displacement can be obtained from the first moment. In the gradient-based method, the horizontal direction and the vertical direction are treated together. However, when displacement between images is one or more pixels, the image needs to be scaled down. Generally, it is implemented at the same time when the scale for the image is changed. Since implement of the gradient-based method is recursive computation, it has defects that it is difficult to estimate computation time and to implement it on hardware.
Moreover, in a traditional method in which an image is discrete Fourier transformed to determine a product of complex conjugates for inverse discrete Fourier transformation, the size of a focused area needs to be 2n, requiring various techniques. This method is often used in the field of fluid measurement, having the characteristic in that the amount of computation can be reduced. However, in the vision field, it is rarely used. In addition to this, since it uses a assumption that a measurement object is a plane, the method has a defect that it is difficult to apply the method to a measurement object with irregularity.
Traditionally, an image is interpolated and estimated separately in the horizontal direction and in the vertical direction, as described in detail below. However, according to a traditional method like this, it has a problem that an estimation error is generated as shown in FIG. 1.
Here, the traditional method is called a “one-dimensional estimation method”, in which an image is considered to be independent in the horizontal direction and in the vertical direction, and subpixel estimation of displacement is separately implemented.
In order to describe the problems of the traditional one-dimensional estimation method, first, we consider the problem that determines displacement in the horizontal direction between images. Suppose true displacement between images is (ds,dt), and the rotation angle of similarity with anisotropy is θg. In the one-dimensional estimation method, R(−1,0), R(0,0), R(1,0) (squares in FIG. 1) were used for discretization similarity, and {circumflex over (d)}s (a black circle in FIG. 1) was estimated by the following Equation 1.
                                          d            ^                    s                =                                            R              ⁡                              (                                                      -                    1                                    ,                  0                                )                                      -                          R              ⁡                              (                                  1                  ,                  0                                )                                                                        2              ⁢                              R                ⁡                                  (                                                            -                      1                                        ,                    0                                    )                                                      -                          4              ⁢                              R                ⁡                                  (                                      0                    ,                    0                                    )                                                      +                          2              ⁢                              R                ⁡                                  (                                      1                    ,                    0                                    )                                                                                        (                  Equation          ⁢                                          ⁢          1                )            
Even though the position of the maximum similarity on line t=0 can be correctly estimated by the above Equation 1, as apparent from FIG. 1, a great estimation error is contained in the true horizontal direction displacement between images ds (a horizontal component of a black triangle in FIG. 1). More specifically, when all the following conditions are true, a horizontal estimation error is generated for {circumflex over (d)}s−ds≠0.
Vertical direction displacement is dt≠0.
Two-dimensional similarity has anisotropy.
The rotation angle θg≠0 in the two-dimensional similarity with anisotropy.
Almost all images fall in the conditions above.
Furthermore, it is similar in the vertical direction. For example, when the SSD self-similarity of the corner area of an image shown in FIG. 2(a) is determined, it is as shown in FIG. 2(b). Since the self-similarity has anisotropy and θg≠0, it is shown that there is the possibility to generate a subpixel estimation error in the traditional one-dimensional estimation method. Moreover, it is shown that there is the possibility to generate an estimation error also in subpixel estimation using a texture image shown in FIG. 3(a).
Furthermore, depending on the field of computer vision, since constraint conditions such as epipolar constraint are used to limit the search area one-dimensionally, in some cases, it is enough to implement high precision matching for one-dimensional signals. However, there are various uses that require high precision two-dimensional displacement using two-dimensional images. For example, they are motion estimation, region segmentation by motion estimation, target tracking, mosaicing, remote sensing, super-resolution, etc.
In region-based matching for images, in the case where displacement between images can be expressed only by translation, many methods are traditionally proposed for actually use. For example, for a typical method, a subpixel estimation method that combines region-based matching with a similarity interpolation method estimates subpixel displacement between images as follows.
When images that determine a correspondence position are f(i,j) and g(i,j), SAD between images with respect to displacement (tx,ty) in integers is computed as follows.
                                          R            SAD                    ⁡                      (                                          t                x                            ,                              t                y                                      )                          =                              ∑                                          (                                  i                  ,                  j                                )                            ∈              W                                ⁢                                                                f                ⁡                                  (                                      i                    ,                    j                                    )                                            -                              g                ⁡                                  (                                                            i                      -                                              t                        x                                                              ,                                          j                      -                                              t                        y                                                                              )                                                                                                    (                  Equation          ⁢                                          ⁢          2                )            
Where SAD represents the sum of absolute values of luminance difference (Sum of Absolute Differences), and W represents a focused area for correspondence. SAD becomes zero when the images are completely matched, and takes a greater value as mismatches are increased. Since SAD expresses an evaluation value of dissemblance, it is non-similarity.
Instead of SAD, the following SSD is often used.
                                          R            SSD                    ⁡                      (                                          t                x                            ,                              t                y                                      )                          =                              ∑                                          (                                  i                  ,                  j                                )                            ∈              W                                ⁢                                    (                                                f                  ⁡                                      (                                          i                      ,                      j                                        )                                                  -                                  g                  ⁡                                      (                                                                  i                        -                                                  t                          x                                                                    ,                                              j                        -                                                  t                          y                                                                                      )                                                              )                        2                                              (                  Equation          ⁢                                          ⁢          3                )            
Where SSD represents the sum of squares of absolute values of luminance difference (Sum of Squared Differences). SSD also becomes zero when the images are completely matched, and takes a greater value as mismatches are increased. Since SSD also expresses an evaluation value of dissemblance, it is non-similarity.
Instead of SAD and SSD, the following correlation coefficient (ZNCC: Zero-mean Normalized Cross-Correlation) is sometimes used. ZNCC regards the pixel density in a focused area as statistical data, and computes statistical correlation values between focused areas.
                                          R            ZNCC                    ⁡                      (                                          t                x                            ,                              t                y                                      )                          =                                            ∑                                                (                                      i                    ,                    j                                    )                                ∈                W                                      ⁢                          (                                                (                                                            f                      ⁡                                              (                                                  i                          ,                          j                                                )                                                              -                                          f                      _                                                        )                                ⁢                                  (                                                            g                      ⁡                                              (                                                                              i                            -                                                          t                              x                                                                                ,                                                      j                            -                                                          t                              y                                                                                                      )                                                              -                                          g                      _                                                        )                                            )                                                                          ∑                                                      (                                          i                      ,                      j                                        )                                    ∈                  W                                            ⁢                                                                    (                                                                  f                        ⁡                                                  (                                                      i                            ,                            j                                                    )                                                                    -                                              f                        _                                                              )                                    2                                ×                                                      ∑                                                                  (                                                  i                          ,                          j                                                )                                            ∈                      W                                                        ⁢                                                            (                                                                        g                          ⁢                                                      (                                                                                          i                                -                                                                  t                                  x                                                                                            ,                                                              j                                -                                                                  t                                  y                                                                                                                      )                                                                          -                                                  g                          _                                                                    )                                        2                                                                                                          (                  Equation          ⁢                                          ⁢          4                )            
Where f and g are mean densities of the respective areas. RZNCC takes values from −1 to 1. It becomes 1 when the images are completely matched, and takes a smaller value as mismatches are increased. The characteristic of ZNCC is in that it can obtain almost the same similarity even though there are variations in density and contrast between images.
Although ZNCC is called similarity because it represents the evaluation value of similarity, SAD and SSD represent non-similarity. Here, for easy description, hereinafter, SAD, SSD, and ZNCC are denoted as “similarity” in a unified manner.
By searching the displacement ({circumflex over (t)}x,{circumflex over (t)}y) in integers where similarity is the minimum value (in SAD and SSD) or the maximum value (in ZNCC), displacement in integers between images can be obtained. After the displacement in integers is obtained, subpixel displacement is estimated as follows.
                                                        d              ^                        x                    =                                                    R                ⁡                                  (                                                                                    t                        ^                                            x                                        -                    1                                    )                                            -                              R                ⁡                                  (                                                                                    t                        ^                                            x                                        +                    1                                    )                                                                                    2                ⁢                                  R                  ⁡                                      (                                                                                            t                          ^                                                x                                            -                      1                                        )                                                              -                              4                ⁢                                  R                  ⁡                                      (                                                                  t                        ^                                            x                                        )                                                              +                              2                ⁢                                  R                  ⁡                                      (                                                                                            t                          ^                                                x                                            +                      1                                        )                                                                                      ⁢                                  ⁢                                            d              ^                        y                    =                                                    R                ⁡                                  (                                                                                    t                        ^                                            y                                        -                    1                                    )                                            -                              R                ⁡                                  (                                                                                    t                        ^                                            y                                        +                    1                                    )                                                                                    2                ⁢                                  R                  ⁡                                      (                                                                                            t                          ^                                                y                                            -                      1                                        )                                                              -                              4                ⁢                                  R                  ⁡                                      (                                                                  t                        ^                                            y                                        )                                                              +                              2                ⁢                                  R                  ⁡                                      (                                                                                            t                          ^                                                y                                            +                      1                                        )                                                                                                          (                  Equation          ⁢                                          ⁢          5                )            
With the use of the above Equation 5, the subpixel displacement between images finally obtained is ({circumflex over (t)}x+{circumflex over (d)}x,{circumflex over (t)}y+{circumflex over (d)}y).
In a traditional method like this, there is a problem that a correct correspondence cannot be obtained when there are rotation and scale variation between images. Even though there is a slight rotation angle between images, it is determined that two images are different, causing problems that a correspondence position is not found and that correspondence is detected at a wrong position.
On the other hand, when the correspondence between images cannot be expressed only by translation, that is, there are rotation and scale variation between images, it is necessary to estimate rotation and scale variation at the same time. Traditionally, at this time, it is necessary that an interpolation process is used to allow one of images (template image) to undergo translation and rotation or scale variation and then matching, each parameter is varied while similarity is being computed, and a parameter is searched that the similarity is the minimum value or the maximum value. Next, this parameter search algorithm will be described.
When f(i,j) is a template image, and g(i,j) is an input image, the similarity RSSD is computed as below where translation (tx,ty), the rotation angle θ, and the scale variation s are parameters.
                                          R            SSD                    ⁡                      (                                          t                x                            ,                              t                y                            ,              θ              ,              s                        )                          =                              ∑                                          (                                  i                  ,                  j                                )                            ∈              W                                ⁢                                    (                                                f                  ⁡                                      (                                          i                      ,                      j                                        )                                                  -                                                      g                    ~                                    ⁡                                      (                                          i                      ,                      j                      ,                                              t                        x                                            ,                                              t                        y                                            ,                      θ                      ,                      s                                        )                                                              )                        2                                              (                  Equation          ⁢                                          ⁢          6                )            
Where {tilde over (g)}(i,j,tx,ty,θ,s) represents an interpolation image of image g(i,j) at position (i,j) when translation (tx,ty), the rotation angle θ, and the scale variation s are given. When an interpolation image is generated, bilinear interpolation (Bi-Linear interpolation), bicubic interpolation (Bi-Cubic interpolation), etc. are used. In addition, SAD and ZNCC may be used, not SSD.
The initial parameters (tx,ty,θ,s)<0> are determined by some method to compute RSSD(tx,ty,θ,s)<0> and search update parameters (tx,ty,θ,s)<1> that give smaller RSSD. Even though parameters are updated, update is continued until variations are eliminated from the result. At this time, a numerical solution is used, including Newton-Raphson method, Steepest (Gradent) Descent method (steepest-descent method), and Levenberg-Marquadt method.
Suppose for the initial parameters (tx,ty,θ,s)<0>, correspondence between images can be expressed only by translation, ({circumflex over (t)}x,{circumflex over (t)}y0,1)<0> is used that uses ({circumflex over (t)}x,{circumflex over (t)}y) estimated by typical region-based matching.
In multiparameter optimization like this, there is a problem that a correct result cannot be obtained when initial values are not suitable. Furthermore, there is also a problem that solutions cannot be obtained because of influence of image patterns, noise, and optical distortion due to lens. Moreover, there is a difficulty that computation time cannot be estimated because of the use of recursive computation. Therefore, it is extremely difficult to form into hardware in regard to implementing algorithms because total response time of a completed system cannot be estimated correctly. In regard to computation time, it is necessary to interpolate an image at each step of recursive computation, but longer computation time is required because the image interpolation operation has a great operation volume. Furthermore, in regard to estimation accuracy, since the properties of an interpolation image are different depending on image interpolation methods, it is a great problem in that an interpolation method to be adopted causes variations in final parameter estimation accuracy and the properties of estimation error.
The present invention has been made in view of circumstances. An object of the invention is to solve the problems of the above traditional methods in consideration of N-dimensional similarity, and to provide a multiparameter high precision concurrent estimation method and a multiparameter high precision concurrent estimation program in image subpixel matching, which can estimate correspondence parameters between images stably, highly precisely, concurrently, at high speed with a small amount of computation.