1. Field of the Invention
The present invention relates to an information processing technique for enabling an information processing system to eliminate erroneous detection of an index contained in an image captured by an imaging apparatus.
2. Description of the Related Art
Conventional Technique 1
The position and orientation measurement of a camera or another image capturing unit (hereinafter, collectively referred to as a “camera”) capturing a physical space is required, for example, in a mixed reality system that can combine a physical space with a virtual space and display a mixed image.
As discussed in Japanese Patent Application Laid-Open No. 11-084307, Japanese Patent Application Laid-Open No. 2000-041173, or A. State, G. Hirota, D. T. Chen, B. Garrett, and M. Livingston: “Superior augmented reality registration by integrating landmark tracking and magnetic tracking,” Proc. SIGGRAPH '96, pp. 429-438, July 1996, there is a conventional method for correcting measurement errors of a sensor that can measure the position and orientation of a camera by using markers or feature points (hereinafter, collectively referred to as an “index”) having known positions and disposed in the physical space.
These methods are characterized in that the position and orientation of a camera can be estimated based on sensing data of a position and orientation sensor equipped to the camera and information relating to indices captured by the camera. The indices used in these methods are, for example, color regions defining a centroid or concentric circles. In general, a predetermined number of indices are provided in a physical space so that two or more indices can be simultaneously captured by the camera.
When the camera captures an image including indices, each index involved in the captured image must be identified as one of the indices disposed in the physical space. As a method for identifying each index, it is possible to use a relationship between the coordinates of each index detected from the image and the image coordinates of the index obtainable by projecting on the image a known position of the index in the physical space, based on measurement values of a position and orientation sensor.
Conventional Technique 2
As discussed in Kato, Billinghurst, Asano, and Tachibana: “An Augmented Reality System and its Calibration based on Marker Tracking,” Transactions of the Virtual Reality Society of Japan, vol. 4, no. 4, pp. 607-616, December 1999, or X. Zhang, S. Fronz, and N. Navab: “Visual marker detection and decoding in AR systems”: A comparative study, Proc. of International Symposium on Mixed and Augmented Reality (ISMAR'02), 2002, there is a conventional method for estimating the position and orientation of a camera based on only the indices captured by the camera without relying on the information obtained from a position and orientation sensor.
For example, the position and orientation of a camera can be estimated based on coordinates of four vertices of a square if such a square index is used. However, the square is rotationally symmetric (in each rotation of 90°) about a rotation axis passing through the center (i.e., a crossing point of diagonal lines) and perpendicular to the square surface. Thus, it is impossible to identify the up-and-down or right-and-left direction based on the coordinates of respective vertices.
To solve this problem, a square index can involve an image feature defining the directions. Furthermore, when plural indices are employed, it is necessary to identify each index based on only an image captured by a camera. Thus, an index can involve graphic information, such as a unique pattern or symbol, differentiated for each index.
Conventional Technique 3
The image display apparatus configured to present a mixed reality as described in the conventional technique 1 can be realized by a video see-through head mounted display. The video see-through head mounted display (i.e., a display unit mountable on the head of a user) can display a mixed image including an image of a virtual space (e.g., a virtual object created by computer graphics or text information) superimposed on an image of a physical space captured by a camera, based on the position and orientation of an imaging apparatus.
In this case, for the purpose of letting other observers see the same scene, a display device can be positioned in a physical space to display an image of the physical space captured by the camera, or a mixed image including the virtual space image superimposed on the image of the physical space, which is currently observed by a user of the head mounted display.
In the method for estimating the position and orientation of a camera according to the conventional technique 1, each index can be a small circular sheet having a specific color. In this case, the information of each index is 3-dimensional position information (i.e., coordinates) and the color.
The method for identifying an index can include the steps of projecting a 3-dimensional position of the index onto an image surface of a camera by utilizing measured values of a position and orientation sensor, detecting the color of the index from the image, and calculating a centroid position from the image. Furthermore, the method can include the steps of comparing the image coordinates of a projected index with the centroid position calculated from the image, and identifying a closest one as a true index.
According to the method utilizing a square marker or another graphic index as described in the conventional technique 2, discrimination of each marker is completely dependent on limited information obtainable from an image. Thus, each index must involve distinctive symbol information or template information.
FIG. 10 shows examples of a practical square marker used in the above-described conventional technique 2, which are discussed in Kato, Billinghurst, Asano, and Tachibana: “An Augmented Reality System and its Calibration based on Marker Tracking,” Transactions of the Virtual Reality Society of Japan, vol. 4, no. 4, pp. 607-616, December 1999, or X. Zhang, S. Fronz, and N. Navab: “Visual marker detection and decoding in AR systems”: A comparative study, Proc. of International Symposium on Mixed and Augmented Reality (ISMAR'02), 2002.
In any conventional technique, if an object similar to an index in color or shape is present in a physical space, the system may erroneously detect this object as a true index if included in an image captured by a camera.
For example, in the above-described conventional technique 3, the display device can display an image of a physical space captured by a camera (or a mixed image including a virtual space image superimposed on the image of the physical space). In this case, if the display device is disposed in the physical space, the camera will capture a display screen of the display device. As a result, an image of an index displayed on the display screen will be erroneously detected as a true index disposed in the physical space.