1. Field of the Invention
The present invention relates to a position and orientation measuring apparatus and position and orientation measuring method, a mixed-reality system, and a computer program.
2. Description of the Related Art
The position and orientation measurement of an image sensing unit such as a camera (to be simply referred to as a “camera” hereinafter as needed) used to capture an image of a physical space is required in a mixed-reality (MR) system that merges and displays the physical space and a virtual space.
As a method of measuring the position and orientation of the camera on the physical space, a method of capturing, using the camera, an image of a plurality of indices whose three-dimensional (3D) positions are known, and calculating the position and orientation of the image sensing unit from the positions of projected points in the captured image has been proposed (see U.S. Pat. No. 6,993,450).
In order to detect the indices from the captured image by image processing, features obtained from a background and other object images, and those of the indices must be separated. For this purpose, since indices having significantly different colors are used in practice, a frame projection region of the indices can be detected from the captured image.
In order to allow measurement of the position and orientation of the camera over a broad range, a plurality of indices must be set on the physical space to which the camera faces. However, it is difficult to set a large number of indices in urban and communal facilities. Furthermore, the extraction method of the indices based on their colors and saturations is susceptible to environmental illuminations, and is hard to use outdoors.
A method of detecting geometric features included in the captured image of the physical space, and measuring the position and orientation of a camera using a large number of detection results has been proposed in the field of computer vision. As a typical geometric feature detection method, the Harris operator is known. The Harris operator detects the positions of edge components which form a corner in an image (see C. Harris and M. Stephens. “A combined corner and edge detector,” Proceedings of the 4th Alvey Vision Conference, pp. 147-151, 1988). Also, a method of trying hypotheses with a plurality of corresponding candidates based on the detected geometric features, and adopting a hypothesis with the fewest errors (RANSAC) has also been proposed (see M. A. Fischler, R. C. Bolles. “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Communication of the ACM. Vol. 24, pp. 381-395, 1981).
On the other hand, assuming the use in urban areas, communal facilities and the like, there are many posters such as signs indicating the destinations and current location, advertisements of merchandise, and the like. Characters normally appear on these posters. Also, the layout, dolor scheme, size, and the like of characters are appropriately set to attract public attention.
It is easy for a person to detect a character region from the background. However, preliminary learning is required for a machine to recognize if an object is a character. For this purpose, a character recognition technique has been proposed (see Japanese Patent No. 02643960), and it is a technique industrially prevalent enough to be sustainable in recognition of printed characters. Also, detection of a character region from an image can be implemented by additionally using an OCR technique.
Many methods for detecting a character region from the captured image of the camera, and exploiting character information in navigation and the like have been examined. This method aims at reading a character string. If there is only one character string, it is assumed that the camera exists in the vicinity of a poster since it can capture the image of the poster. This method does not consider the acquisition of the position and orientation of the camera. However, an image sensing frame includes a plurality of posters unless a single poster is captured as a closeup in the image sensing frame of the camera. Therefore, in order to accurately obtain the relationship with other posted contents or character strings, the position and orientation of the camera must be acquired.
In consideration of an application to navigation in premise movement or the like, it is a common practice to point to the next direction to go, and information associated with the position and orientation of the camera is indispensable in accurate navigation. Furthermore, since GPS service is not available underground or inside buildings, it is difficult to directly apply the car navigation mechanism to the position and orientation estimation of a camera which is carried around by a person for premise movement over a broad range underground or inside buildings.
Note that the position and orientation estimation of a camera using its captured image is superior in terms of cost owing to the versatility of the camera as an input apparatus. Meanwhile, in order to allow movement over a broad range and to implement the position and orientation measurement using the captured image, indices whose 3D coordinate positions are known must be captured. However, in an area that requires prior approval about settings such as urban areas, communal facilities and the like, it is difficult to set a large number of such indices. Therefore, in consideration of use over a broad range without limiting the range of use, only features which already exist in the area can be used.
From the aforementioned perspective, methods of detecting geometric features of a structure using image processing, and using regions around the detected features as indices have already been proposed. These methods often use the aforementioned Harris corner detection.
However, with the detection method which reacts to unspecified many regions included in an image like the corner detection method, many feature points are detected from structures with periodicity on the outer walls of buildings in a place with many buildings. Since it is difficult to make correspondence between the many detected feature points and registered features, a method of selecting a better result by trying many hypotheses is adopted.
Furthermore, with the above detection method, since the shapes of edges of an object to be detected largely change depending on the image sensing position of the camera, a plurality of features to be detected must be registered depending on the orientations. For this reason, this method is effective only in a state in which the landscape is outdoors and far away.
On the other hand, if the physical space includes features that can be used as indices and can be significantly detected by image processing, the position and orientation measurement of the camera in a place where indices cannot be set in advance or over a broad range with movement can be attained by preferentially exploiting such features.
In an area where people come and go, there are many objects and planes described with characters. For example, there are advertising characters in towns and posters indicating exit directions in communal facilities such as stations and the like. Furthermore, even in facilities such as companies, schools, and the like, there are many posters using characters.
However, a method for obtaining the position and orientation of a camera over a broad range using characters already present on the physical space has not been proposed yet.