1. Field of the Invention
The present invention relates to an information processing technique applicable to a system required to identify an index contained in an image captured by an imaging apparatus.
2. Description of the Related Art
Conventional Technique 1
The position and orientation measurement of a camera or another capture unit capturing a physical space (hereinafter, collectively referred to as a “camera”) is, for example, required in a mixed reality system that combines a physical space and a virtual space to display a combined space image.
As discussed in Japanese Patent Application Laid-Open No. 11-084307, Japanese Patent Application Laid-Open No. 2000-041173, or A. State, G. Hirota, D. T. Chen, B. Garrett, and M. Livingston: “Superior augmented reality registration by integrating landmark tracking and magnetic tracking,” Proc. SIGGRAPH '96, pp. 429-438, July 1996, there is a conventional method for correcting measurement errors obtained by a sensor that can measure the position and orientation of a camera by using a marker or a feature point (hereinafter, collectively referred to as an “index”) disposed at a known position in the physical space.
Conventional Technique 2
As discussed in Kato, Billinghurst, Asano, and Tachibana: “Marker tracking based augmented reality system and related calibration,” The Virtual Reality Society of Japan, monograph, vol. 4, no. 4, pp. 607-616, December 1999, or X. Zhang, S. Fronz, and N. Navab: “Visual marker detection and decoding in AR systems”: A comparative study, Proc. of International Symposium on Mixed and Augmented Reality (ISMAR'02), 2002, there is a conventional method for estimating the position and orientation of a camera based on only an index captured by the camera without relying on the information obtained from a position and orientation sensor.
For example, the position and orientation of a camera can be estimated based on coordinates of four vertices of a square if such a square index is used. However, the square is rotationally symmetric (in each rotation of 90°) about a rotation axis passing through the center (i.e., a crossing point of diagonal lines) and perpendicular to the square surface. Thus, it is impossible to identify the up-and-down or right-and-left direction based on the coordinates of respective vertices. To solve this problem, a square index can involve an image feature defining the directions. Furthermore, when plural indices are employed, it is necessary to discriminate each index based on only an image captured by a camera. Thus, an index can involve graphic information, such as a unique pattern or symbol, differentiated for each index.
Conventional Technique 3
Furthermore, there is a conventional method for estimating the position and orientation of an object body by using plural capture units each having a known position and orientation in a physical space and capturing images of plural indices each having a known position on the object body. According to this conventional technique, a light-emitting diode (LED) is used as a unit for identifying each detected index, and the light emission timing of each LED is controlled.
In the method for estimating the position and orientation of a camera according to the conventional technique 1, each index can be a small circular sheet having a specific color. In this case, the information of each index is three-dimensional position information (i.e., coordinates) and the color.
The method for identifying an index can include the steps of projecting a three-dimensional position of the index onto an image surface of a camera by utilizing measured values of a position and orientation sensor, detecting the color of the index from the image, and calculating a centroid position from the image. Furthermore, the method can include the steps of comparing the position of the index projected on the image surface with the centroid position calculated from the image, and identifying a closest one as a true index.
However, according to the above-described color region detection method for detecting an index from an image, a region or object similar to the index may be erroneously detected from a physical space captured by the camera, for example, when the color of the erroneously detected region is similar to the color of the index.
To solve the above-described problem, there is a conventional method using a combined index composed of different color patterns concentrically disposed, including the steps of detecting color regions, checking a combination of detected colors, and identifying a region having a correct color combination as a true index. According to this method, compared to a method using a monochrome index, the possibility of erroneously detecting part of the background image as an index can be lowered.
However, to accurately and stably detect an index based on the color region detection method, the color of each index must be easily and surely recognized. Furthermore, when an index is composed of different color patterns concentrically disposed, the index must be sufficiently large in a captured image so that the concentric color patterns can be surely detected. In other words, an excessively large index, which may spoil the physical space image, must be disposed in the physical space. However, disposing such an excessively large index in a limited physical space may not be allowed, or may deteriorate the visibility of the physical space.
According to the method utilizing a square marker or another graphic index as described in the conventional technique 2, discrimination of each marker is completely dependent on the limited information obtainable from an image. Thus, each index must involve distinctive symbol information or template information.
FIG. 9 shows examples of a practical square marker used in the above-described conventional technique 2, which is discussed in Kato, Billinghurst, Asano, and Tachibana: “Marker tracking based augmented reality system and related calibration,” The Virtual Reality Society of Japan, monograph, vol. 4, no. 4, pp. 607-616, December 1999, or X. Zhang, S. Fronz, and N. Navab: “Visual marker detection and decoding in AR systems”: A comparative study, Proc. of International Symposium on Mixed and Augmented Reality (ISMAR'02), 2002.
However, such a complicated index cannot be stably detected from a captured image, unless the index image occupies a sufficiently large area in the entire captured image surface. In other words, a relatively wide region of the physical space must be provided for the index, or the camera must be sufficiently close to the index. Thus, such an index will encounter severe conditions in setting or arrangement.
Furthermore, there is a conventional method for using an LED or comparable light-emitting element or a retroreflector, as a distinctive index having a smaller size. However, according to this method, there will be an erroneous detection if a light-emitting element or reflector similar to the index is present.
For example, as shown in FIG. 10, a light-emitting element 203 may be present in the background space of an index 205 mounted on a physical object 204. The index 205 is a luminescent ball capable of, for example, emitting infrared light. The light-emitting element 203 is an electric bulb capable of emitting infrared light.
FIG. 11 shows an image captured by an infrared camera 101 through a visible light cutting (i.e., infrared light transmitting) filter 202, which includes bright circular regions corresponding to the light-emitting element 203 and the index 205.
In this case, if the processing for detecting a brighter region corresponding to the index 205 is applied to the image, not only a circular region 301 corresponding to the index 205 but also a circular region 302 corresponding to the light-emitting element 203 will be detected as regions having the brightness of the index 205. In such a case, it is difficult to identify the index 205 between two candidate regions 301 and 302. Accordingly, if an index and a similar body which are not distinctive in brightness are present in a physical space captured by a camera, erroneous recognition of the index will occur.
Furthermore, there is a conventional method using a plurality of light-emitting elements (or reflectors) disposed, as indices, at predetermined positions in a mutually fixed relationship, and discriminating individual indices based on a detected positional relationship. However, if one or more light-emitting elements (or reflectors) are concealed, discrimination of the index is unfeasible. Furthermore, disposing a large number of indices in a limited physical space may deteriorate the visibility of the physical space.
On the other hand, the conventional technique 3 requires a light emission timing control mechanism or a high-speed camera, because the light emission timing must be controlled in a time division fashion to discriminate individual indices. Thus, the cost increases.