1. Field of the Invention
The present invention relates to a technique for determining a position and an orientation of an object.
2. Description of the Related Art
Recently, a significant amount of research has been done on mixed reality which aims at a seamless synthesis of a real space and a virtual space. Mixed reality may be shown on an image display device by a video see-through method in which an image of a virtual space (for example, virtual objects drawn by computer graphics or textual information) is superimposed on an image of a real space shot by an imaging device, such as a video camera. The virtual space is generated depending on the position and orientation of the imaging device.
In addition, mixed reality may also be shown on an optical see-through display attached to the head of a viewer by an optical see-through method in which an image of a virtual space generated depending on the position and orientation of a viewing point of the viewer is displayed.
As applications of these image display devices different from known applications of virtual reality, new fields like surgical assist systems in which the internal state of a patient is superimposed on the patient's body or mixed reality games in which a player fights a virtual enemy in a virtual space are expected.
In both of these new fields of application, there is a demand for accurate registration between the real and virtual spaces, and various experiments have been performed for this purpose. In the video see-through method, accurate determination of position and orientation of the imaging device in a scene (that is, in a world coordinate system) leads to the accurate registration in the mixed reality. Similarly, in the optical see-through method, accurate determination of position and orientation of the viewing point or the display in a scene leads to the accurate registration in the mixed reality.
In the video see-through method, the position and orientation of the imaging device in the scene are generally determined by arranging or setting a plurality of indices, i.e. artificial markers or natural features, in the scene and detecting the coordinates of projections of the indices on an image shot by the imaging device.
In addition, an inertial sensor may be attached to the imaging device, and the position and orientation of the imaging device estimated on the basis of the result of measurement by the sensor may be used in the process of detecting the indices. The result of estimation may also be used as initial values for calculating the position and orientation based on the image or as rough position and orientation in the case in which the indices are not detected. Accordingly, more stable registration can be performed compared to the case in which only the image information is used (refer to, for example, Hirofumi Fujii, Masayuki Kanbara, Hidehiko Iwasa, Haruo Takemura, and Naokazu Yokoya, A Registration Method Using Stereo Cameras with an Inertial Sensor for Augmented Reality, IEICE Technical Report, PRMU 99-192, vol. 99, no. 574, pp. 1-8).
In the optical see-through method, the position and orientation of a target object (that is, the viewer's head or the display) are generally determined by attaching an imaging device (and an inertial sensor) to the target object, determining the position and orientation of the imaging device by the above-described method, and calculating the position and orientation of the target object on the basis of the position and orientation of the imaging device.
A known position/orientation determination apparatus which determines a position and an orientation of an imaging device will be described below with reference to FIG. 1. FIG. 1 shows the structure of the known position/orientation determination apparatus. As shown in FIG. 1, the known position/orientation determination apparatus 100 includes an image input unit 160, a data memory 170, an index detector 110, an orientation sensor unit 140, an orientation prediction unit 150, and a position/orientation calculator 120, and is connected to an imaging device 130.
In addition, as indices to be shot by the imaging device 130 (hereafter simply called indices), a plurality of indices Qk (k=1 . . . K1) are arranged at a plurality of positions in a real space. The positions of the indices Qk in a world coordinate system (a coordinate system defined by an origin positioned at a certain point in the real space and X, Y, and Z axes which are perpendicular to each other) are known in advance. The indices Qk can be arranged such that at least three or more of them are always observed on an image obtained by the imaging device 130 when the imaging device 130 is positioned within a target area in which the position and orientation are to be determined. In the example shown in FIG. 1, four indices Q1, Q2, Q3, and Q4 are arranged, and three indices Q1, Q3, and Q4 among them are disposed in the field of view of the imaging device 130.
The indices Qk may be circular markers of different colors or feature points, such as natural features, having different texture features. Any type of indices may be used as long as image coordinates of projections of the indices on the image are detectable and the indices can be individually distinguished from each other.
The image output by the imaging device 130 (hereafter called a shot image) is input to the position/orientation determination apparatus 100.
The image input unit 160 converts the shot image input to the position/orientation determination apparatus 100 into digital data, stores the data in the data memory 170, and outputs the time at which the image has been input to the orientation prediction unit 150.
The orientation sensor unit 140 is attached to the imaging device 130. The orientation sensor unit 140 measures the current orientation of itself and outputs the measured orientation to the orientation prediction unit 150. The orientation sensor unit 140 is based on a gyro sensor, for example, TISS-5-40 produced by Tokimec Inc. or InertiaCube2 produced by InterSense Inc. may be used. Each of these orientation sensor units generates a drift error which accumulates with time. Therefore, the measured orientation includes an error, and is different from the actual orientation.
The orientation prediction unit 150 receives a calculated orientation of the imaging device 130 (output from the position/orientation calculator 120) at the time corresponding to the previous frame (hereafter called time τ−1) from the data memory 170. In the case in which the imaging device 130 outputs NTSC signals, time τ−1 is 33.3 msec before if the position and orientation are to be determined for each frame, and is 16.7 msec before if the position and orientation are to be determined for each field. In addition, the orientation prediction unit 150 also receives the measured orientation at the time corresponding to the current frame (hereafter called time τ) from the orientation sensor unit 140, predicts the orientation of the imaging device 130 at time τ, and outputs the predicted orientation to the data memory 170.
The index detector 110 receives the shot image from the data memory 170 and detects the image coordinates of the indices Qk included in the image. For example, in the case in which the indices Qk are markers of different colors, areas corresponding to the colors of the markers in the shot image are detected, and coordinates of the center points of the detected areas are defined as detected coordinates. In addition, in the case in which the indices Qk are feature points having different texture features, template matching using template images of the indices are performed on the shot image to detect the positions of the indices. In this case, the template images of the indices are stored in advance as known information. The index detector 110 may also receive the calculated position of the imaging device 130 (output from the position/orientation calculator 120) at time τ and the predicted orientation (output from the orientation prediction unit 150) from the data memory 170. In such a case, these values are used for predicting the positions of the indices on the image and limiting the search ranges, so that the index detection process can be performed with less calculation load and the risk of false detection or misidentification of the indices can be reduced.
Then, the index detector 110 outputs image coordinates uQkn and identifiers kn of the detected indices Qkn to the data memory 170. Here, n (n=1 . . . N) indicates the detected indices, and N shows the total number of the detected indices. For example, in the case shown in FIG. 1, N is 3, and identifiers k1=1, k2=3, and k3=4 and the corresponding image coordinates uQk1, uQk2, and uQk3 are output.
The position/orientation calculator 120 receives the predicted orientation at time τ and a set of the image coordinates uQkn and world coordinates xWQkn of each index Qkn detected by the index detector 110 from the data memory 170. Then, the position/orientation calculator 120 calculates the position and orientation of the imaging device 130 on the basis of the relationships between the indices using the predicted orientation and the calculated position at time τ−1 as initial values. The thus calculated position and orientation is output to the data memory 170 and to an external device via an interface (I/F) (not shown).
The data memory 170 stores the image input from the image input unit 160, the predicted orientation input from the orientation prediction unit 150, the calculated position and orientation input from the position/orientation calculator 120, the image coordinates and identifiers of the indices input from the index detector 110, and the world coordinates of the indices which are known in advance, and inputs or outputs these data as necessary.
A process performed by the orientation prediction unit 150 included in the known apparatus will be described below with reference to a flowchart shown in FIG. 2.
Although there are various ways to express orientation, a 3-by-3 rotation matrix R is used in this example.
In Step S2000, the orientation prediction unit 150 determines whether or not a trigger (generated when a new image is input) is input from the image input unit 160. If the trigger is input (yes in Step S2000), the process proceeds to Step S2010. If the trigger is not input, Step S2000 is repeated.
In Step S2010, the orientation prediction unit 150 receives a measured orientation R# from the orientation sensor unit 140 (# represents that the data is obtained as a result of measurement by the sensor), and sets this orientation as the measured orientation R#τ at time τ.
In Step S2020, the orientation prediction unit 150 receives a calculated orientation Rτ−1 at time τ−1 from the data memory 170.
In Step S2030, the orientation prediction unit 150 calculates a relative orientation change ΔR# of the imaging device 130 between time τ−1 and time τ as follows:ΔR#=(R#τ−1·RSC)−1·R#τ·RSC  (1)RSC represents a 3-by-3 matrix which transforms an orientation in a camera coordinate system (coordinate system in which the position and orientation of the imaging device 130 are expressed) to that in a sensor coordinate system (coordinate system in which the position and orientation of the orientation sensor unit 140 are expressed). RSC is set in advance as known data based on the fixed relationship between the orientation of the orientation sensor unit 140 and that of the imaging device 130.
In Step S2040, the orientation prediction unit 150 calculates a predicted orientation R*τ at time τ by adding the orientation change ΔR# to the calculated orientation Rτ−1 at time τ−1 as follows:R*τ=Rτ−1·ΔR#  (2)Then, the orientation prediction unit 150 outputs the predicted orientation R*τ to the data memory 170.
In Step S2050, the orientation prediction unit 150 determines whether or not to finish the process. If the process is to be continued (no in Step S2050), the measured orientation R#τ at time τ is memorized as the measured orientation R#τ−1 in the previous cycle, and the process returns to Step S2000.
Next, a process performed by the position/orientation calculator 120 in the known apparatus is described below with reference to a flowchart shown in FIG. 3. In the known structure, the position and orientation of the imaging device 130 are calculated by iterative solution of nonlinear equations.
In the position/orientation calculator 120, the position and orientation of the imaging device 130 to be calculated are internally expressed by three-element vectors t=[x y z]T and ω=[ξψζ]T. Accordingly, unknown parameters to be determined are expressed by a six-element state vector s=[x y z ξψζ]T.
Although there are various ways to express an orientation with three elements, a three-element vector which defines a rotating angle with the magnitude thereof and a rotating-axis direction with the direction thereof is used in this example. In addition, the orientation ω may also be expressed using a 3-by-3 rotation matrix R as follows:
                              R          ⁡                      (            ω            )                          =                  [                                                                                                                                        ξ                        2                                                                    θ                        2                                                              ⁢                                          (                                              1                        -                                                  cos                          ⁢                                                                                                          ⁢                          θ                                                                    )                                                        +                                      cos                    ⁢                                                                                  ⁢                    θ                                                                                                                                                                  ξ                        ⁢                                                                                                  ⁢                        ψ                                                                    θ                        2                                                              ⁢                                          (                                              1                        -                                                  cos                          ⁢                                                                                                          ⁢                          θ                                                                    )                                                        -                                                            ζ                      θ                                        ⁢                                                                                  ⁢                    sin                    ⁢                                                                                  ⁢                    θ                                                                                                                                                                  ξ                        ⁢                                                                                                  ⁢                        ζ                                                                    θ                        2                                                              ⁢                                          (                                              1                        -                                                  cos                          ⁢                                                                                                          ⁢                          θ                                                                    )                                                        +                                                            ψ                      θ                                        ⁢                                                                                  ⁢                    sin                    ⁢                                                                                  ⁢                    θ                                                                                                                                                                                          ψ                        ⁢                                                                                                  ⁢                        ξ                                                                    θ                        2                                                              ⁢                                          (                                              1                        -                                                  cos                          ⁢                                                                                                          ⁢                          θ                                                                    )                                                        +                                                            ζ                      θ                                        ⁢                                                                                  ⁢                    sin                    ⁢                                                                                  ⁢                    θ                                                                                                                                                                  ψ                        2                                                                    θ                        2                                                              ⁢                                          (                                              1                        -                                                  cos                          ⁢                                                                                                          ⁢                          θ                                                                    )                                                        +                                      cos                    ⁢                                                                                  ⁢                    θ                                                                                                                                                                  ψ                        ⁢                                                                                                  ⁢                        ζ                                                                    θ                        2                                                              ⁢                                          (                                              1                        -                                                  cos                          ⁢                                                                                                          ⁢                          θ                                                                    )                                                        -                                                            ξ                      θ                                        ⁢                                                                                  ⁢                    sin                    ⁢                                                                                  ⁢                    θ                                                                                                                                                                                          ζ                        ⁢                                                                                                  ⁢                        ξ                                                                    θ                        2                                                              ⁢                                          (                                              1                        -                                                  cos                          ⁢                                                                                                          ⁢                          θ                                                                    )                                                        -                                                            ψ                      θ                                        ⁢                                                                                  ⁢                    sin                    ⁢                                                                                  ⁢                    θ                                                                                                                                                                  ζ                        ⁢                                                                                                  ⁢                        ψ                                                                    θ                        2                                                              ⁢                                          (                                              1                        -                                                  cos                          ⁢                                                                                                          ⁢                          θ                                                                    )                                                        +                                                            ξ                      θ                                        ⁢                    sin                    ⁢                                                                                  ⁢                    θ                                                                                                                                                                  ζ                        2                                                                    θ                        2                                                              ⁢                                          (                                              1                        -                                                  cos                          ⁢                                                                                                          ⁢                          θ                                                                    )                                                        +                                      cos                    ⁢                                                                                  ⁢                    θ                                                                                ]                                    (        3        )            where θ=√{square root over (ξ2+Ψ2+ζ2)}Thus, ω and R may be uniquely transformed into each other. A method for transforming R into ω is commonly known, and detailed explanations thereof are thus omitted herein.
In Step S3000, the position/orientation calculator 120 receives the predicted orientation R*τ of the imaging device 130 at time τ, and obtains a three-element vector ω*τ (=[ξ*τψ*τζ*τ]T).
In Step S3005, the position/orientation calculator 120 calculates an initial state vector s=[xτ−1 yτ−1 zτ−1ξ*τψ*τζ*τ]T by combining the predicted orientation ω*τ and the calculated vector tτ−1 at time τ−1.
In Step S3010, the position/orientation calculator 120 receives the image coordinates of the indices detected by the index detector 110 from the shot image input at time τ and the world coordinates of the detected indices from the data memory 170.
In Step S3020, the position/orientation calculator 120 determines whether or not the input information regarding the indices is enough to estimate the position and orientation, and divides the process in accordance with the result of determination. More specifically, if the number of input indices is three or more, the process proceeds to Step S3030. If the number of input indices is less than three, the process proceeds to Step S3090.
In Step S3030, the position/orientation calculator 120 calculates estimated image coordinates uQkn* for each index Qkn. Calculation of the estimated image coordinates uQkn* is performed by a function of world coordinates xWQkn of each index Qkn stored in advance as known information and the current state vector s as follows:uQkn*=Fc(xWQkn,S)  (4)The function Fc( ) includes the following equation for obtaining camera coordinates xCQkn (coordinate system defined by an origin positioned at a certain point on a camera and X, Y, and Z axes which are perpendicular to each other) of each index from xWQkn and s:
                              x          C                      Q                          k              n                                      =                              [                                                                                x                    C                                          Q                                              k                        n                                                                                                                                                              y                    C                                          Q                                              k                        n                                                                                                                                                              z                    C                                          Q                                              k                        n                                                                                                                  ]                    =                                                    R                ⁡                                  (                  ω                  )                                                            -                1                                      ⁢                                                  ⁢                          (                                                x                  W                                      Q                                          k                      n                                                                      -                t                            )                                                          (        5        )            and the following equation for obtaining the image coordinates uQkn* from the camera coordinates xCQkn:
                              u                      Q                          k              n                        *                          =                                            [                                                                                          u                      x                                              Q                                                  k                          n                                                *                                                                                                                        u                      y                                              Q                                                  k                          n                                                *                                                                                                        ]                        T                    =                                    [                                                                                                                  -                                                  f                          x                          C                                                                    ⁢                                                                        x                          C                                                      Q                                                          k                              n                                                                                                                                z                          C                                                      Q                                                          k                              n                                                                                                                                                                                                                              -                                                  f                          y                          C                                                                    ⁢                                                                        y                          C                                                      Q                                                          k                              n                                                                                                                                z                          C                                                      Q                                                          k                              n                                                                                                                                                                                      ]                        T                                              (        6        )            Here, fCx and fCy are focal lengths of the imaging device 130 along the x axis and the y axis, respectively, and are set in advance as known values.
In Step S3040, the position/orientation calculator 120 calculates errors ΔuQkn between the estimated image coordinates uQkn* and the measured image coordinates uQkn for each index Qkn as follows:ΔuQkn=uQkn−uQkn*  (7)
In Step S3050, the position/orientation calculator 120 calculates an image Jacobian JusQkn(=∂u/∂s) with respect to the state vector s for each index Qkn The image Jacobian JusQkn is a 2-row by 6-column Jacobian matrix having elements obtained by partial differentiation of function Fc( ) in Equation 4 with the elements of the state vector s. More specifically, first, a 2-row by 3-column Jacobian matrix JuxQkn (=∂u/∂x) having elements obtained by partial differentiation of the right side of Equation 6 with the elements of the camera coordinates xCQkn is calculated. In addition, a 3-row by 6-column Jacobian matrix JxsQkn (=∂x/∂s) having elements obtained by partial differentiation of the right side of Equation 5 with the elements of the state vector s is also calculated. Then, the image Jacobian JusQkn is calculated as follows:JusQkn=JuxQkn·JxsQkn  (8)
In Step S3060, the position/orientation calculator 120 calculates a correction vector Δs for the state vector s on the basis of the errors ΔuQkn and the image Jacobians JusQkn calculated in Steps S3040 and S3050, respectively. More specifically, first, a 2N-dimension error vector is obtained by arranging the errors ΔuQkn vertically as follows:
                    U        =                  [                                                                      Δ                  ⁢                                                                          ⁢                                      u                                          Q                                              k                        1                                                                                                                                                ⋮                                                                                      Δ                  ⁢                                                                          ⁢                                      u                                          Q                                              k                        N                                                                                                                          ]                                    (        9        )            In addition, a 2N-row by 6-column matrix is obtained by arranging the image Jacobians JusQkn vertically as follows:
                    Θ        =                  [                                                                      J                  us                                      Q                                          k                      1                                                                                                                          ⋮                                                                                      J                  us                                      Q                                          k                      N                                                                                                    ]                                    (        10        )            Then, the correction vector Δs is calculated using the pseudo inverse matrix Θ′ of Θ as follows:Δs=ΘU  (11)Since N is 3 in the example shown in FIG. 1, the error vector U a 6-dimension vector, and Θ is a 6-row by 6-column matrix.
In Step S3070, the position/orientation calculator 120 corrects the state vector s using the correction vector Δs calculated at Step S3060, and sets the corrected state vector s as a new estimated state vector s as follows:s+Δs→s  (12)
In Step S3080, the position/orientation calculator 120 determines whether or not the calculation is converged using a certain criterion, for example, whether or not the error vector U is smaller than a predetermined threshold or whether or not the correction vector Δs is smaller than a predetermined threshold. If the calculation is not converged, the process returns to Step S3030, and Step S3030 and Steps S3030-S3080 are repeated using the corrected state vector s.
If it is determined in Step S3080 that the calculation is converged, the process proceeds to Step S3090 and the position/orientation calculator 120 outputs the obtained state vector s as information of the position and orientation of the imaging device 130 at time τ (that is, sτ) The information of the position and orientation may be s itself. Alternatively, a set of the 3-by-3 matrix R representing the orientation and the 3-dimension vector t representing the position calculated from s may also be output.
In Step S3100, the position/orientation calculator 120 determines whether or not to finish the process. If the process is to be continued, the process returns to Step S3000 and input data corresponding to the next frame (time τ+1) and the following frames are subjected to a similar process.
The above-described method is commonly used for determining positions and orientations of imaging devices. In addition, this method is also used to determine positions and orientations of arbitrary target objects (for example, an optical see-through head mounted display (HMD)). In this case, an imaging device is attached to the target object and the position and orientation of the imaging device are determined by the above-described method. Then, the position and orientation of the target object are obtained from the known relationship between the position and orientation of the imaging device and those of the target object.
In the above-described known method, the information obtained from the orientation sensor unit is used only as auxiliary information for predicting the coordinates of the indices or calculating the initial values in the registration process based on the indices detected from the image, and the final estimated position and orientation are determined only from the image information. Therefore, if the input image does not include enough image information to perform stable registration, for example, when the indices are collected in a relatively narrow area of the image, when only three indices are detected, or when errors occur in the index detection, there is a risk that solution with sufficient accuracy and stability cannot be obtained. In addition, when only two or less indices are observed, it is impossible to obtain the solution. Although these problems can be prevented by uniformly setting many indices in the scene, it becomes difficult to distinguish the indices from one another and the appearance of the real space would be degraded in such a case.