Car navigation systems have prevailed since they can offer many functions at lower prices in recent years. Along with the advances of ITS (Intelligent Transport Systems), a car navigation display is expected to function as a terminal which presents various kinds of information that assist the driver in addition to navigation information.
However, with the conventional car navigation system, the driver must temporarily turn his or her eyes to the interior of the vehicle to observe the display. Hence, the driver can only recognize traffic by peripheral vision, and must temporarily shift attention away from the scene in the front of the vehicle.
As a technique that can solve this problem, an HUD (Head Up Display) is known. The HUD is a device for projecting and displaying an image on the front windshield. With this device, the driver can acquire navigation information without turning his or her eyes from the real scene.
With the conventional car navigation system, the driver must associate navigation information (normally superposed on a map) presented on the display with the real scene by himself or herself, and such information is not easily recognized intuitively.
To solve this problem, a device called “On-the-Scene HUD” has been conventionally proposed. The On-the-Scene HUD presents navigation information on the front windshield of a vehicle, which is considered as a see-through display, so that the navigation information that the driver wants is superimposed at an appropriate position on the real scene (for example, refer to J. Fukano, S. Okabayashi, & M. Sakata, “Automotive Head-Up Displays for Navigation Use”, The 14th International Technical Conference on Enhanced Safety of Vehicles, No. 94-S2-0-02, pp. 306-314, 1994 (non-patent reference 1)).
As described in non-patent reference 1, the driver can quickly recognize navigation information using the On-the-Scene HUD. However, this reference does not describe any practical registration method required to present navigation information to be superimposed at an appropriate position on the real scene.
In general, a technique for superimposing predetermined information at a predetermined position on the real scene is called a mixed reality technique. In a general arrangement of a mixed reality presentation apparatus, an observer can simultaneously observe an image displayed on a display and a real scene observed via the display by wearing a see-through type HMD (Head Mounted Display) on his or her head. At this time, in order to superimpose a predetermined image at a predetermined position on the real scene, the viewpoint position and orientation of the observer in the real scene must be measured, and an image must be generated accordingly.
The viewpoint position and orientation of the observer can be measured by various methods. A method of attaching a magnetic sensor or ultrasonic sensor to the HMD is normally used. However, since a magnetism or ultrasonic wave source must be arranged in the real scene, the movable range of the observer is limited. Also, such method cannot obtain sufficiently high measurement precision.
On the other hand, as a method that can measure the viewpoint position and orientation with high precision without any restrictions on the measurement range, a method of detecting an index in the real scene from video data sensed by a video camera attached to an HMD, and measuring the viewpoint position and orientation on the basis of the detected index has been proposed. For example, in a conventional system, since the image coordinate position of the index extracted from the image is used as an input to an Extended Kalman Filter, the viewpoint position and orientation are estimated as state variables (for example, refer to Yasuyoshi Yokokoji, Yoshihiko Sugawara, & Tsuneo Yoshikawa, “Accurate Image Overlay on HMD using Vision and Accelerometers”, Transactions of the Virtual Reality Society of Japan, Vol. 4, No. 4, pp. 589-598, 1999 (non-patent reference 2)).
However, the registration method in the aforementioned mixed reality presentation apparatus using the HMD cannot realize registration in the On-the-Scene HUD. This is because the viewpoint of the observer is fixed with respect to the display in the HMD, while the relative positional relationship between the viewpoint of the observer (driver) and display is not fixed in the HUD.
It is required for a conventional mixed reality presentation apparatus in which the positional relationship between the display and viewpoint is not fixed to measure the position and orientation of a display screen in the real scene and the position of the observer with respect to the display screen so as to attain registration (for example, refer to Japanese Patent Laid-Open No. 2000-276613 (patent reference 1)).
On the other hand, as for measurement of the position and azimuth of a vehicle, a vehicle measurement apparatus based on a GPS (Global Registration System) and inertial navigation is used in conventional car navigation systems.
A device described in patent reference 1 is premised on use of a display unit held by a hand, and assumes use of a magnetic sensor or ultrasonic sensor to measure the position and orientation of the display unit. Hence, this patent reference does not describe any method of measuring a vehicle which moves over a broad range.
Since the method described in non-patent reference 2 is that for measuring the head position of a person who can take an arbitrary position and orientation, its solution has a high degree of freedom, and a wrong estimated value is often output.
Since the method described in non-patent reference 2 executes a detection process for all indices which may be included in a view volume of the video camera, even when there is an index which is not observed on an image since it is occluded behind another object in the real scene, the detection process of that index is executed, and causes a detection error of the index, thus outputting the estimated values of a wrong position and orientation.
The vehicle measurement apparatus based on the GPS and inertial navigation has poor precision, and can hardly be used in applications such as travel direction indication at a crossing, advance indication of a lane direction, and the like, in which accurate registration with the real scene is indispensable.
The present invention has been made in consideration of the above problems, and has as its object to provide a technique for superimposing navigation information in a vehicle at an appropriate position on a real scene.
The position and orientation of an image sensing unit (to be also referred to as a camera hereinafter) such as a camera used to sense a real space must be measured in a mixed reality system which displays a real space and virtual space together. As a prior art associated with such technique, a method of correcting a measurement error of a position/orientation sensor that measures the position and orientation of a camera using a marker whose position is known and which is arranged in the real space or a feature point in a real space whose position is known (the marker and feature points will be generally referred to as an index hereinafter) is available (for example, refer to Japanese Patent Laid-Open No. 11-084307 (patent reference 2), Japanese Patent Laid-Open No. 2000-041173 (patent reference 3), and Japanese Patent Application No. 2000-354230 (patent reference 4)).
In the prior arts associated with this method, although they use different calculation principles, means, and steps, the position and orientation of a camera are obtained on the basis of information obtained from a position/orientation sensor of six degree of freedom, which is used to measure the position and orientation of the camera, information of indices whose positions are known and which are arranged in a real space, and information obtained by capturing these indices by the camera.
In these methods, as one of means for determining which of indices arranged in the real space corresponds to an index detected from an image, determination means which compares the coordinate position of an index detected from an image with that of an index on an image plane, which is obtained by projection based on the position and orientation measurement values, and determines indices which have a smaller distance as those which correspond to each other is used.
Assume that the “indices used to correct measurement errors of the position/orientation sensor” are arranged on the side surfaces of a tower-like object in various directions with respect to a real space where the tower-like object is present, as shown in FIG. 11.
Referring to FIG. 11, reference numeral 5201 denotes a camera used to sense such real space; 5202, a three-dimensional (3D) position/orientation sensor used to roughly measure the position and orientation of the camera 5201; 5203 and 5204, indices used to correct measurement errors of the 3D position/orientation sensor 5202; and 5205, a tower-like object where the indices 5203 and 5204 are arranged. Furthermore, the camera 5201 is movable around the tower-like object 5205, and may sense all side surfaces of the tower-like object 5205.
In such case, the tower-like object 5205 may be sensed from a position nearly perpendicular to a given side surface of the tower-like object 5205, as shown in FIG. 12. In FIG. 12, reference numeral 5300 denotes an image (image sensing frame) sensed by the camera 5201; and 5301, the index 5203 (FIG. 11) which appears in the image. Reference numeral 5302 denotes a coordinate position on the image sensing frame, which is obtained by projecting and calculating the 3D position of the index 5203 on the image plane of the sensed image 5300 on the basis of the position and orientation of the camera 5201 measured by the 3D position/orientation sensor 5202; and 5303, a coordinate position on the image sensing frame, which is obtained by projecting and calculating the 3D position of the index 5204 onto the image plane of the sensed image 5300.
If the position and orientation measurement values of the 3D position/orientation sensor 5202 are free from any errors, the coordinate position 5302 and the coordinate position of the image 5301 originally indicate the same coordinate position. However, in practice, since the position and orientation measurement values include an error, they do not indicate the same position. In the prior art, as means for determining which of indices which are actually arranged corresponds to an index (5301 in FIG. 12) detected from a sensed image, the distance between the coordinate value of an index detected from an image, and that of the index on a screen, which is calculated based on the position and orientation measurement values is used. For this reason, in the example shown in FIG. 12, since the distance between the coordinate position 5303 and the coordinate position of the image 5301 is smaller than that between the coordinate position 5302 and the coordinate position of the Image 5301 due to the influence of sensor errors, 5301 and 5303 are determined as corresponding indices. Since 5303 is a point obtained by projecting the index arranged on the back side of this tower-like object 5205, the above result means wrong correspondence.
It is, therefore, another object of the present invention to accurately identify which of indices arranged in a real space corresponds to an index detected from an image.