1. Field of the Invention
The present invention relates to a method which extracts a 3-dimensional shape of an object from a sequence of pictures such as moving pictures that are taken by use of a digital video camera or the like.
2. Description of the Related Art
One of the important research subjects in the field of computer vision is how to find a 3-dimensional shape of an object from moving pictures or a sequence of still pictures, which are taken by using a digital video camera, a digital still camera, or the like. This technology has utility in various application fields such as robot vision, automatic cruising vehicle, mechanic data entry via a video camera, image coding, 3-dimensional modeling, etc., and is an important topic of today in these application fields.
In order to extract 3-dimensional information from a sequence of 2-dimensional images, a scheme called Structure from Motion obtains an estimate of a shape from depth information, which is obtained from motion information. Namely, camera movement is obtained first, and, then, distances of object features from the camera center are obtained to generate an estimate of the object shape. Since feature points show very small positional shifts from one frame to another in moving pictures, however, it is almost impossible to identify the motion as either a parallel motion or a rotational motion. Because of this, solutions of the depth information may become infeasible solutions, resulting in unsuccessful reconstruction of shape information. When a time sequence is obtained at large sampling intervals, on the other hand, feature points show large movement between frames. In this case, however, reliability in feature point matching decreases.
In order to obtain stable solutions, Tomasi and Kanade presented a factorization method, which calculates motion and shape concurrently (C. Tomasi and T. Kanade, xe2x80x9cShape and motion from image stream under orthography: A factorization method,xe2x80x9d International Journal of Computer Vision, vol.9, 1992, pp. 137-254, the contents of which are hereby incorporated by reference). This method employs linear matrix representation based on a linear projection model, and uses singular value decomposition, which is robust against numerical errors. This method can obtain quite stable solutions, which is a feature contrasting this method from other schemes.
Further, Poelman and Kanade presented another factorization method based on a paraperspective projection model, which more closely approximates the perspective projection of an actual camera system than the linear projection model, and maintains linear-matrix formalization of the problem to be solved (C. J. Poelman and T. Kanade, xe2x80x9cA paraperspective factorization method for shape and motion recovery,xe2x80x9d IEEE transaction on Pattern Analysis and Machine Intelligence, vol.19, no.3, pp.206-218, the contents of which are hereby incorporated by reference).
In the following, the paraperspective projection model and the factorization method based thereon will be described.
The paraperspective projection model takes into account both a scaling effect and a positioning effect of the perspective projection while maintaining benefits of linearity of the linear projection system. The scaling effect refers to the fact that the closer an object to a viewpoint, the larger the object appears. The positioning effect refers to the fact that an object positioned near an edge of a picture frame appears at a different angle from an object positioned near a projection center. According to the paraperspective projection model, a projection of an object onto an image plane is obtained through the following steps:
1) define an imaginary plane parallel to the image plane and including a center of gravity of the object;
2) obtain projections of object points onto the imaginary plane by tracing projections parallel to a line connecting between a camera center and the center of gravity; and
3) obtain projections of the object points from the imaginary plane onto the image plane via a perspective projection model.
FIG. 1 is an illustrative drawing for explaining the paraperspective projection model.
In FIG. 1, an image plane 2 is provided at a focal distance from a camera center 1. A center of gravity C is obtained with respect to a set of object feature points, pictures of which are taken by the camera. Some of the object feature points are shown in the figure as solid squares. An imaginary plane 3 is parallel to the image plane 2, and includes the center of gravity C. An origin of world coordinates is positioned at the center of gravity C, and 3-dimensional coordinates of a feature point p is represented by sp xcex5R3.
In an image frame f that is taken out of an image sequence, the camera center 1 has world coordinates tf. Further, 2-dimensional local coordinates on the image plane 2 have base vectors if, jf xcex5R3 (∥if∥=∥jf∥=1, ifxc3x97jf=0), and an optical axis of the camera is represented by a base vector kf=ifxc3x97jf xcex5R3. In the image frame f, a 2-dimensional local coordinate system xcexa3f=(Of; jf, if) is defined, where an origin Of is an intersecting point between the vector kf and the image plane 2.
In the paraperspective projection model, a projection of the feature point p onto the image plane 2 is obtained through the following two steps, as previously described. At the first step, the feature point p is projected onto the imaginary plane 3. This projection is made in parallel to a line that passes through the camera center 1 and the center of gravity C. At the second step, the projection of the feature point on the imaginary plane 3 is further projected onto the image plane 2 via perspective projection. The projection of the feature point p onto the image plane 2 has coordinates (ufp, vfp) in the 2-dimensional local coordinate system xcexa3f=(Of; if, jf). Here, the focal distance of the camera is assumed to be 1. The coordinates (ufp, vfp) are represented as:
ufp=mfxc2x7sp+xf
vfp=nfxc2x7sp+yfxe2x80x83xe2x80x83(1)
where
zf=(xe2x88x92tf)xc2x7kf
xf=(xe2x88x92tf)xc2x7if/zf, yf=(xe2x88x92tf)xc2x7jf/zfxe2x80x83xe2x80x83(2)
mf=(ifxe2x88x92xfkf)/zf, nf=(jfxe2x88x92yfkf)/zf
Here, zf is a distance from the camera center 1 to the imaginary plane 3, and (xp, yp) is a point where the projection of the center of gravity C is positioned on the image plane 2 via perspective projection. Further, coordinates (Ufp, Vfp), which represent the projection of the feature point p onto the image plane 2 as obtained directly through perspective projection, are represented as:
Ufp=ifxc2x7(spxe2x88x92tf)/zfp, Vfp=jfxc2x7(spxe2x88x92tf)/zfp
zfp=kfxc2x7(spxe2x88x92tf)xe2x80x83xe2x80x83(3)
When a Taylor expansion of the coordinates (Ufp, Vfp) around zf is taken into consideration, it can be seen that the paraperspective projection model is a first-order approximation of the perspective projection model under the assumption of:
|sp|2/zf2≅0xe2x80x83xe2x80x83(4)
In what follows, the factorization method will be described. In the factorization method, P feature points are tracked through F image frames. Then, the 2-dimensional local coordinates (ufp, vfp) of the P feature points (p=1, 2, . . . , P) over the F frames (f=1, 2, . . . , F) on the image plane 2 are obtained as a 2Fx P matrix:                     W        =                  [                                                                      u                  11                                                            ⋯                                                              u                                      1                    ⁢                    p                                                                                                      ⋮                                                              u                  fp                                                            ⋮                                                                                      u                  F1                                                            ⋯                                                              u                  Fp                                                                                                      v                  11                                                            ⋯                                                              v                                      1                    ⁢                    p                                                                                                      ⋮                                                              v                  fp                                                            ⋮                                                                                      v                  F1                                                            ⋯                                                              v                  Fp                                                              ]                                    (        5        )            
Hereinafter, the matrix W is referred to as a tracking matrix. An upper half of the tracking matrix represents x coordinates ufp of the feature points, and a lower half represents y coordinates vfp of the feature points. Each column of the tracking matrix shows coordinates of a single feature point tracked over the F frames, and each row of the tracking matrix represents x or y coordinates of all the feature points in a given frame.
Then, an average xf of x coordinates of all the feature points is obtained with respect to each frame, and an average yf of y coordinates is also obtained in the same manner.                                                         1              p                        ⁢                                                            ∑                  p                                                  p                  =                  1                                            ⁢                              u                fp                                              =                      x            f                          ,                  xe2x80x83                ⁢                                            1              p                        ⁢                                                            ∑                  p                                                  p                  =                  1                                            ⁢                              v                fp                                              =                      y            f                                              (        6        )            
The averages xf and yf are subtracted from each corresponding element of the tracking matrix. A resulting matrix W* is hereinafter referred to as a measurement matrix.                               W          *                =                  W          -                                    [                                                                                          x                      1                                                                                                            ⋮                                                                                                              x                      f                                                                                                                                  y                      1                                                                                                            ⋮                                                                                                              y                      f                                                                                  ]                        ⁢                          xe2x80x83                        [                          1              ⁢                              xe2x80x83                            ⁢              …              ⁢                              xe2x80x83                            ⁢              1                        ]                                              (        7        )            
The measurement matrix is characterized by the number of levels that is three at most even when the number P of feature points and the number F of frames are increased. Thus, the measurement matrix can be decomposed as:
W*(2FxP)=R(2Fx3)S(3xP)xe2x80x83xe2x80x83(8)
A comparison of this equation with the equation (1) reveals that 2F-x-3 matrix R represents the camera""s position vectors (mf, nf) (f=1, 2, . . . , F), and that 3-x-P matrix S represents position vectors sp of feature points (p=1, 2, . . . , P).
In general, the measurement matrix are not free from noises, which may make the number of levels of the matrix more than three. Even in such a case, as the matrix is decomposed through singular value decomposition such as to retain the three largest singular values, an optimum decomposition is guaranteed in terms of minimization of square errors. By the same token, a measurement matrix obtained via the paraperspective projection model can be decomposed into a camera position matrix and a feature-point shape matrix. Such decomposition of a measurement matrix is called xe2x80x9cfactorizationxe2x80x9d.
In the following, a basic algorithm of factorization of a measurement matrix will be described. For the purpose of factorization, singular value decomposition of a matrix is utilized. By using singular value decomposition, the measurement matrix is decomposed into three matrixes as:
W*(2FxP)=U(2FxP)xcexa3(PxP)V(PxP)xe2x80x83xe2x80x83(9)
Here, U is a 2F-x-P orthogonal matrix, and xcexa3 is a P-x-P diagonal matrix comprised of singular values ("sgr"1, "sgr"2, . . . , "sgr"p) of the measurement matrix. Further, V is a P-x-P orthogonal matrix. If the number of levels of the measurement matrix is three, the singular values "sgr"4 and thereafter will be close to zero. Based on the assumption that the singular values "sgr"4 and thereafter are zero, the measurement matrix is decomposed as:
W*(2FxP)=Û(2Fx3){circumflex over (xcexa3)}(3x3){circumflex over (V)}(3xP)xe2x80x83xe2x80x83(10)
By using representations:
Û={circumflex over (R)}, {circumflex over (xcexa3)}{circumflex over (V)}=Ŝxe2x80x83xe2x80x83(11)
the decomposition of the measurement matrix is written as:
W*={circumflex over (R)}Ŝxe2x80x83xe2x80x83(12)
Unfortunately, decomposition of the equation (12) is not unique. As a matter of fact, use of an arbitrary unitary matrix Q proves that an infinite number of solutions exist as follows.
W*={circumflex over (R)}Ŝ={circumflex over (R)}QQxe2x88x921Ŝ=({circumflex over (R)}Q)(Qxe2x88x921Ŝ)={tilde over (R)}{tilde over (S)}={tilde over (W)}*xe2x80x83xe2x80x83(13)
In light of this, constraints as follows are introduced so as to find the matrix Q that satisfies these constrains.                                                                                           "LeftDoubleBracketingBar"                                      m                    f                                    "RightDoubleBracketingBar"                                2                                            1                +                                  x                  f                  2                                                      +                                                            "LeftDoubleBracketingBar"                                      n                    f                                    "RightDoubleBracketingBar"                                2                                            1                +                                  y                  f                  2                                                              =                      (                          1                              z                f                2                                      )                          ⁢                  
                ⁢                  (                                    f              =              1                        ,            2            ,            …            ⁢                          xe2x80x83                        ,            F                    )                                    (        14        )                                                                    m              f                        ·                          n              f                                =                                                                      x                  f                                ⁢                                  y                  f                                            2                        ⁢                          (                                                                                          "LeftDoubleBracketingBar"                                              m                        f                                            "RightDoubleBracketingBar"                                        2                                                        1                    +                                          x                      f                      2                                                                      +                                                                            "LeftDoubleBracketingBar"                                              n                        f                                            "RightDoubleBracketingBar"                                        2                                                        1                    +                                          y                      f                      2                                                                                  )                                      ⁢                  
                ⁢                  (                                    f              =              1                        ,            2            ,            …            ⁢                          xe2x80x83                        ,            F                    )                                    (        15        )            xe2x80x83∥mf∥=1xe2x80x83xe2x80x83(16)
Then, the matrix Q is used as:
R={circumflex over (R)}Q, S=Qxe2x88x921Ŝxe2x80x83xe2x80x83(17)
to find a unique way to decompose the measurement matrix as follows.
Ŵ=RSxe2x80x83xe2x80x83(18)
Here, the 2F-x-3 matrix R represents the camera""s position, and the 3-x-P matrix S represents 3-dimensional coordinates of the feature points. A direction of the camera (if, jf, kf) (f=1, 2, . . . , F) is obtained from the matrix R (i.e., (mf, nf): f=1, 2, . . . , F) and the coordinates (xf, yf) that are obtained from the equations (6). Further, zf is obtained from the equation (14), and the camera position tf is obtained from the equation (2).
Whether the linear projection model or the paraperspective projection model is used, the factorization method is based on the assumption that all the feature points are trackable through the entire image sequence. If some feature points found in the first frame are lost in subsequent frames, or new feature points are introduced halfway through the image sequence, this basis assumption is violated. When a camera rotates all around an object, for example, features appearing in the first frame are inevitably occluded in subsequent frames, so that the above assumption is not satisfied for a long sequence of object images. As a result, a tracking matrix tends to have many missing measurements (missing matrix elements), which need to be interpolated by estimates.
Further, the factorization method as described above have two different solutions, one corresponding to a convex shape and the other corresponding to a concave shape. Distinction between a convex shape and a concave shape, therefore, cannot be made.
Accordingly, there is a need for a method which can obtain a highly accurate 3-dimensional shape of an object from an image sequence even when a tracking matrix obtained from the sequence lacks some of the elements thereof.
Further, there is a need for a method which can efficiently estimate missing measurements of the tracking matrix.
Moreover, there is a need for a method which can determine a 3-dimensional shape of an object including a determination of whether a concave surface is observed or a convex surface is observed.
Accordingly, it is a general object of the present invention to provide a method which can satisfy the needs described above.
It is another and more specific object of the present invention to provide a method which can obtain a highly accurate 3-dimensional shape of an object from an image sequence even when a tracking matrix obtained from the sequence lacks some of the elements thereof.
In order to achieve the above objects according to the present invention, a method of obtaining a 3-dimensional shape of an object from a sequence of image frames includes the steps of a) generating a tracking matrix which has matrix elements representing coordinates of feature points of the object tracked through the sequence, and has each row representing a corresponding image frame and each column representing a corresponding feature point, wherein some of the matrix elements are missing, b) generating an estimation matrix as a sub-matrix of the tracking matrix by selecting rows and by selecting a column of a given feature point and columns of a predetermined number of feature points closest to the given feature point, such that the estimation matrix has matrix elements thereof missing only for the given feature point in a single image frame, c) calculating estimates of the missing matrix elements of the estimation matrix, d) repeating the steps b) and c) to obtain estimates of remaining missing matrix elements of the tracking matrix, and e) obtaining a 3-dimensional shape of the object from the tracking matrix having the missing matrix elements, thereof estimated.
In the method described above, the estimation matrix used for estimating the missing matrix elements is generated by selecting the feature points that are close to the feature point to be estimated. The feature points selected in such a manner insure conditions that are required for the paraperspective projection model to closely approximate the actual perspective projection system. This guarantees that obtained estimates of the missing matrix elements are highly accurate, which results in generation of an accurate object shape.
It is an other object of the present invention to provide a method which can efficiently estimate missing measurements of the tracking matrix.
In order to achieve the above object according to the present invention, the method as described above is such that the step c) includes repeating estimation of the missing matrix elements of the estimation matrix until the estimation is successful while size of the estimation matrix is increased at each attempt of the estimation of the missing matrix elements of the estimation matrix.
In the method described above, the estimation matrix generated for the estimation purpose has a size no greater than that which is necessary to obtain estimates, so that efficient estimation of the missing measurements is guaranteed.
It is still another object of the present invention to provide a method which can determine a 3-dimensional shape of an object including a determination of whether a concave surface is observed or a convex surface is observed.
In order to achieve the above object according to the present invention, the method as described above further includes the steps of generating the sequence of image frames by taking pictures of the object while the object rotates relative to a camera view, obtaining motion of the feature points from the tracking matrix having the missing matrix elements thereof estimated, and reversing a convex surface to a concave surface or reversing a concave surface to a convex surface with regard to said 3-dimensional shape of the object if the motion of the feature points is in a direction opposite to a rotational direction of the object.
In the method described above, confusion between convex surfaces and concave surfaces is resolved by comparing the estimated motion of the object with the actual motion of the object.