In recent years, computer graphics (CG) that uses 3-dimensional shape data to generate an image has made progress, and an image close to the actual image can be generated (refer to, e.g., Non-Patent Document 1). In the case where, for example, a CG technique disclosed in Non-Patent Document 1 is used, 3-dimensional shape data (also referred to as “3-dimensional shape”) of an individual object and color information (texture) of the surface thereof are prepared. Then, the position/posture with respect to a camera and position, type, and intensity of illumination which exists in a shooting environment are specified, whereby an image close to the actual image can be generated.
In general, in the case where an image of an object for which no blueprint exists, such as a face of human being, is generated, it is necessary to use a special measurement apparatus in order to obtain the 3-dimensional shape and texture of an individual object. In view of this, a method is developed in which a predetermined deformation model is previously prepared and 3-dimensional shape and texture are generated based on the deformation model. More specifically, in this case, a certain number of object sets similar to faces of various persons are measured so as to prepare a generalized deformation model. Then, adequate numeral parameters are given to the prepared generalized deformation model, whereby the 3-dimensional shape and texture of an individual object that has not been measured are generated.
An example of a technique that uses a predetermined deformation model to estimate the 3-dimensional shape and texture is disclosed in, e.g., Non-Patent Document 2. In the technique disclosed in Non-Patent Document 2, the 3-dimensional shape and texture of a large number of faces of persons are previously measured using a 3-dimensional shape measurement apparatus and, based on the measured data sets, a generalized 3-dimensional face model from which the 3-dimensional shape and texture of an unmeasured face of a given person can be formed is generated. More specifically, the generalized face model stores average values of the 3-dimensional shape and texture of faces of a large number of persons, respectively, and a basis vector set representing the change amounts from these average values. Then, values obtained by multiplying the basis vector set by coefficient parameters are respectively added to the average values, whereby the 3-dimensional shape and texture of the generalized face model is approximate to those of a face of a given person. That is, simply by specifying parameters of the shape and texture respectively, the 3-dimensional shape and texture of a face of a given person can be obtained.
By combining the generalized face model disclosed in Non-Patent Document 2 and CG technique disclosed in Non-Patent Document 1, a desired image for an object can be reproduced. More specifically, by specifying parameters of the 3-dimensional shape, texture, position/posture, and illumination condition, a desired image can be reproduced.
Further, a use of the technique disclosed in Non-Patent Document 2 also allows estimation of the 3-dimensional shape and texture of a given individual object from one input image of the individual object. FIG. 11 is a block diagram showing an example of a configuration of a 3-dimensional shape estimation system that estimates the 3-dimensional shape of an object based on a generalized face model. As shown in FIG. 11, a 3-dimensional shape estimation system based on a conventional generalized face model includes a generalized face model generation means 900, an illumination condition initial value input means 901, a position/posture initial value input means 902, a face model generation means 903, a perspective transformation means 904, a shade and shadow generation means 905, and a parameter update means 906. The conventional 3-dimensional shape estimation system having the above configuration operates as follows.
The generalized face model generation means 900 previously generates a generalized face model (also referred to as “generalized 3-dimensional face model”). Further, the generalized face model generation means 900 outputs the generated generalized 3-dimensional face model to the face model generation means 903.
The illumination condition initial value input means 901 inputs, as an initial value, the approximate value of an illumination parameter representing a state of the illumination at the photographing time of an input image showing an object whose 3-dimensional shape is to be estimated. For example, the illumination condition initial value input means 901 sets the direction of one infinite point light source as (1θ, 1φ) and inputs, as an illumination parameter, a 3-dimensional direction vector l of the point light source, intensity L thereof, and a value a representing the intensity of diffusion light.
The following is an example of an input method using the illumination condition initial value input means 901. First, an input image and a 3-dimensional image generated using the CG technique disclosed in Non-Patent Document 1 are displayed side-by-side on a screen of a computer terminal. Then, while viewing the screen of the computer terminal, a user (e.g., operator) manually makes an adjustment such that the shade and shadow of the 3-dimensional image becomes as similar as possible to those of the input image. As described above, the illumination parameter can manually be input by the user.
The position/posture initial value input means 902 inputs, as an initial value, the approximate value of a parameter (referred to as “position/posture parameter”) representing the position and posture of an object shown in an input image. For example, assuming that the position of an object is represented by 3-dimensional coordinate vector o and direction of the object is represented by rotation angles γ, θ, and φ about x, y, and z axes, the position/posture initial value input means 902 inputs, as the position/posture parameter, a matrix R representing the rotation of each of the rotation angles γ, θ, and φ.
The following is an example of an input method using the position/posture initial value input means 902. First, an input image and a 3-dimensional image generated using the CG technique disclosed in Non-Patent Document 1 are displayed side-by-side on a screen of a computer terminal. Then, while viewing the screen of the computer terminal, a user manually makes an adjustment such that the direction of the 3-dimensional image becomes as similar as possible to that of the input image. As described above, the position/posture parameter can manually be input by the user. Alternatively, the following input method can be adapted. The user selects the position of a characteristic part of an object so as to input the position on the image of the selected part. Then, the user selects the 3-dimensional coordinate data of the part corresponding to the input position from the previously stored 3-dimensional coordinate data of respective parts and uses it to calculate the position/posture parameter of the object.
The face model generation means 903 stores a generalized 3-dimensional face model previously calculated by the generalized face model generation means 900. Further, the face model generation means 903 inputs 3-dimensional shape parameter and texture parameter and uses the generalized 3-dimensional face model to calculate individual 3-dimensional shape and individual texture data specific to a person described in the input parameters for output.
The generalized 3-dimensional face model consists of an average value {bar S, bar T} obtained from the 3-dimensional shape {Si′} and texture data {Ti′} of a large number (e.g., about 200) of persons and basis vector set {Sj, Tj} representing deviation from the average value. The generalized face model generation means 900 first creates a matrix (covariance matrix) S in which 3-dimensional shape vectors from which the average value have been subtracted are arranged using the following expression (1).[Numeral 1]S=[(S′1− S),(S′2− S), . . . ,(S′n− S)]  expression (1)
Further, the generalized face model generation means 900 calculates eigenvalues and eigenvectors of the covariance matrix S and stores np (e.g., 100) basis vector set {Sj} in the descending order of the calculated eigenvalues σS,i2. This is equivalent to that the generalized face model generation means 900 performs approximation shown in the following expression (2) with a matrix having np eigenvalues as diagonal components being Σs, and matrix in which the basis vector sets {Sj} are arranged being Vs.[Numeral 2]S=UsΣsVsT  expression (2)
Further, the generalized face model generation means 900 performs calculation with respect to the texture in a manner entirely similar to the case of the 3-dimensional shape data to thereby obtain about 100 eigenvalues σT,i2 and basis vector {Tj} for storage.
According to the calculations described above, a generalized 3-dimensional face model can be obtained from the basis vector and eigenvalue data that the generalized face model generation means 900 has calculated.
Assuming that a vector S in which the 3-dimensional coordinate values (xi, yi, zi) of respective points i on the surface of the face is [x0, y0, z0, x1, y1, z1, . . . , xn, yn, zn]T and vector T in which the luminance values ti of the texture is [t0, t1, . . . , tn], the 3-dimensional shape data of a face of a given person is represented by the following expression (3).
                    [                  Numeral          ⁢                                          ⁢          3                ]                                                                      S          =                                    S              _                        +                                          ∑                j                            ⁢                                                          ⁢                                                s                  j                                ⁢                                  S                  j                                                                    ,                  T          =                                    T              _                        +                                          ∑                j                            ⁢                                                          ⁢                                                t                  j                                ⁢                                  T                  j                                                                                        expression        ⁢                                  ⁢                  (          3          )                    
In the above expression 3, {sj} is a shape parameter, {tj} is a texture parameter. That is, the face model generation means 903 inputs the shape parameter {sj} and texture parameter {tj} and uses the expression (3) to calculate an individual 3-dimensional shape S and individual texture T for output. The 3-dimensional shape estimation system shown in FIG. 11 is, as described later, repeatedly performs update of respective parameters and outputs parameter values satisfying a predetermined convergence condition as a 3-dimensional shape estimation result. In this case, at the initial processing time of the update processing thus repeatedly performed, the face model generation means 903 uses 0 as the initial values of the shape and texture parameters to calculate the individual 3-dimensional shape S and individual texture T.
The perspective transformation means 904 receives as an input the individual 3-dimensional shape S, individual texture T, and position/posture parameters of the object and calculates a correspondence table representing to which pixels constituting an image to be reproduced respective data points in a 3-dimensional shape correspond. Assuming that the focal length of a camera photographing an input image is f and center of the input image is (cu, cv), the perspective transformation means 904 calculates the image coordinate (ui, vi) of a point i on the 3-dimensional face model using the following expression (4).
                    [                  Numeral          ⁢                                          ⁢          4                ]                                                                                  u            i                    =                                    c              u                        +                          f              ⁢                              X                Z                                                    ,                              v            i                    =                                                    c                v                            +                              f                ⁢                                                      Y                    Z                                    ⁢                                                                          [                                                                                    X                                                                                                            Y                                                                                                            Z                                                                              ]                                                      =                                          R                ⁡                                  [                                                                                                              x                          i                                                                                                                                                              y                          i                                                                                                                                                              z                          i                                                                                                      ]                                            +              0                                                          expression        ⁢                                  ⁢                  (          4          )                    
The perspective transformation means 904 divides the entire surface of the 3-dimensional shape into triangular polygons, calculates (ui, vi) of respective apexes of the polygons. After that the perspective transformation means 904 fills the interior of the polygons with points and performs hidden surface removal processing using a value of Z. The hidden surface removal processing, which is a standard method of a CG technique, can be carried out by using various graphics library programs. As a result of the hidden surface removal processing, the perspective transformation means 904 can obtain the correspondence table representing correspondence between each pixel i constituting the reproduced image and polygon j.
The shade and shadow generation means 905 receives as an input the individual 3-dimensional shape, correspondence table, and individual texture and calculates the luminance values of respective pixels constituting the reproduced image. The shade and shadow generation means 905 can perform shadow generation processing by using, e.g., a standard method of the CG technique described in Non-Patent Document 1. First, the shade and shadow generation means 905 calculates a normal vector ni for each data point i on the individual 3-dimensional shape S. Assuming that, on the object surface, the data points of two apexes other than the point i in the same triangular polygon is (j, k) in the counterclockwise order and 3-dimensional vector representing a 3-dimensional coordinate of a point (i, j, k) is pi, pj, pk, the shade/shadow generation means 905 can calculate the normal vector ni using the following expression (5).
                    [                  Numeral          ⁢                                          ⁢          5                ]                                                                      n          i                =                                            (                                                P                  j                                -                                  P                  i                                            )                        ×                          (                                                P                  k                                -                                  P                  i                                            )                                                                                      (                                                      P                    j                                    -                                      P                    i                                                  )                            ×                              (                                                      P                    k                                    -                                      P                    i                                                  )                                                                                    expression        ⁢                                  ⁢                  (          5          )                    
Further, after calculating the normal vectors of all data points, the shade and shadow generation means 905 uses Phong Reflection Model to calculate the luminance value Ii of a pixel according to the following expression (6) (refer to Non-Patent Document 1).Ii=(a+Lnjl)T(j)+cjkL(rjvj)v rj=2(njl)nj−1  expression (6)
In the above expression (6), T(j) is a texture luminance value of a polygon j corresponding to the pixel i, which is indicated by the correspondence table, k is a specular reflection constant (fixed value), visa specular reflection characteristic (fixed value), and cj is a value indicating whether the polygon j is directed to light source direction l or covered by shadow and assumes 0 or 1. The shade and shadow generation means 905 performs calculation using a standard method of a CG technique such as a ray tracing method to thereby determine cast shadow.
According to the above processing, the shade and shadow generation means 905 calculates the luminance value for each pixel constituting a reproduced image and, based on the calculation result, generates a reproduced image. In the following, a vector in which the luminance values of pixels each having a corresponding shape data point are arranged is set as Imodel=(I0, I1, . . . , In), and a vector in which the luminance values of the input image corresponding to respective pixels are arranged is set as Iinput.
The parameter update means 906 compares the input image and reproduced image generated by the shade and shadow generation means 905 and changes parameters of the position/posture, illumination, 3-dimensional shape, and texture such that the reproduced image becomes close to the input image. The 3-dimensional shape estimation system repeatedly executes the processing using the face model generation means 903, perspective transformation means 904, shade and shadow generation means 905, and parameter update means 906 so as to find optimum parameter values.
The parameter update means 906 first calculates a cost function EI concerning the similarity between images using the following expression (7).EI=|Imodel−Iinput|2  expression (7)
Then, the parameter update means 906 calculates a cost function E with respect to the shape parameter and texture parameter according to the prior probability distribution of the model using the following expression (8).
                    [                  Numeral          ⁢                                          ⁢          6                ]                                                            E        =                                            1                              σ                1                2                                      ⁢                          E              l                                +                                    ∑              i                        ⁢                                                  ⁢                                          s                i                                            σ                                  s                  ,                  j                                2                                              +                                    ∑              i                        ⁢                                                  ⁢                                          t                i                                            σ                                  T                  ,                  i                                2                                                                        expression        ⁢                                  ⁢                  (          8          )                    
The parameter update means 906 searches for such parameters as to minimize the cost function E using a probabilistic descent method. To this end, the parameter update means 906 calculates a differential value J of E for each of shape, texture, illumination, and position/posture parameters using the following expression (9).
                    [                  Numeral          ⁢                                          ⁢          7                ]                                                            J        =                                         [                                                  ⁢                                          {                                                      ⅆ                    E                                                        ⅆ                                          s                      i                                                                      }                            ,                                                          ⁢                              {                                                      ⅆ                    E                                                        ⅆ                                          t                      i                                                                      }                            ,                                                          ⁢                                                ⅆ                  E                                                  ⅆ                                      l                    θ                                                              ,                                                          ⁢                                                ⅆ                  E                                                  ⅆ                                      l                    ϕ                                                              ,                                                          ⁢                                                ⅆ                  E                                                  ⅆ                  L                                            ,                                                          ⁢                                                ⅆ                  E                                                  ⅆ                  a                                            ,                                                          ⁢                                                ⅆ                  E                                                  ⅆ                  p                                            ,                                                          ⁢                                                ⅆ                  E                                                  ⅆ                  θ                                            ,                                                          ⁢                                                ⅆ                  E                                                  ⅆ                  γ                                            ,                                                          ⁢                                                ⅆ                  E                                                  ⅆ                  ϕ                                                      ⁢                                                  ]                                              expression        ⁢                                  ⁢                  (          9          )                    
Assuming that a vector in which all parameters are arranged is α=[si, ti, lθ, lφ, a, p, θ, γ, φ], the parameter update means 906 updates the current parameter with a new value α* using the following expression (10).α*=α−(JTJ)−1JE  expression (10)
When the update amount of each of the calculated parameters is smaller than a previously set threshold value, the parameter update means 906 determines that the repetitive processing has converged and ends the processing. Further, in the case where the error between the reproduced image generated using the updated parameters and input image does not become smaller than that between the reproduced image generated using the parameters before update and input image, the parameter update means 906 determines that the repetitive processing has converged and ends the processing. The 3-dimensional shape estimation system then outputs, as a 3-dimensional shape estimation result, an individual 3-dimensional shape at the time when a reproduced image closest to the input image is obtained in the course of the repetitive processing.    Non-Patent Document 1: Manson Woo, Jackie Neider, Tom Davis, “Open GL Programming Guide (Second edition)” Addison-Wesley Publishers Japan Ltd, p. 169-195    Non-Patent Document 2: Volker Blanz, Thomas Vetter, “Face Recognition Based on Fitting a 3-dimensional Morphable Model”, PAMI, 2003, vol. 25, No. 9, pp. 1063-1074.