1. Field of the Invention
The present invention relates to a tool and method for producing an augmented image by combining computer-generated virtual-images with a real-world view, and more particularly to a tool and method for using the autocalibration of scene features to produce the augmented image.
2. General Background and State of the Art
An Augmented Reality (AR) is a view of the world that contains a mixture of real and computer-generated (CG) objects. Computer generated objects can include text, images, video, 3-dimensional models, or animations. Augmented reality is distinguished from simple overlays by the fact that the combined real and computer generated objects are observed and behave as if they were in some defined spatial relationship to each other. For example, an augmented reality scene may contain a computer generated picture on a real wall, or a real picture that appears to be hanging on a virtual wall. In some cases the real world may be completely occluded by the computer generated objects and in others, the computer generated objects may not be visible in a particular view of the real-world view.
As the viewing position and orientation (also known as view pose) changes, the real and computer generated objects shift together to preserve the viewer""s sense that their spatial relationships are maintained. For example, a computer generated cup positioned to appear on a real table-top will maintain the appearance of being on the table from as many viewing directions and positions as possible. To maintain the real and computer generated object relationships as the viewing pose changes, the computer generated system must have information about the view pose to produce an appropriate view of the computer generated object to merge with the real world. The process whereby this view pose information is obtained is known as Pose Tracking (PT).
Pose tracking is performed with measurement systems using a variety of sensors and signals including ultrasound, optical beacons, or inertial technologies. Relevant to this patent application are the methods using images from still or video cameras to determine viewing pose. Many approaches are known for computing where a camera must be (i.e., its pose) given a particular image or series of images.
Pose tracking with cameras often relies on the detection and tracking of features and their correspondences to calibrated positions or coordinates. These terms are further defined below:
Features are any identifiable parts of a scene that can be located in one or more images. Examples include points, corners, edges, lines, and curves. Regions with numerous intensity variations are called texture regions. Examples of texture regions include a text character, lines of text on a page, foliage on trees, a photograph on a wall.
Feature detection is performed by a computer analysis of an image. The detection process searches for a particular type of feature or texture region and computes its 2D coordinates within the image.
Features are tracked between images by computer analysis of two images containing the same features. For example, as a camera pans to the right, the features in the first image appear shifted to the left in the second image. Feature tracking computes that change in the 2D position of a feature from one image to the next. Tracking matches or corresponds the features in one image to another image.
Correspondences identify matching image features or their coordinates. There are two often-used forms of correspondences. 2D correspondences identify the same features detected in two or more images. 3D correspondences identify an image feature and its known 3D coordinates. Features with 3D correspondences are called calibrated features.
Features can also be corresponded by recognizing their association with color, shape, or texture regions. For example, consider a series of images showing a single blue dot among many red dots on a white wall. In any image, the detected blue dot feature can be corresponded to the same blue dot appearing in another image (2D correspondence) because of its unique color. In any image the detected blue dot can also be corresponded to its known 3D coordinate (3D correspondence) because of its unique color. Just as color can distinguish between otherwise similar features, shape or texture regions can distinguish otherwise similar features. For example, a blue triangle can be distinguished from blue dots. In another example, a blue dot with a letter xe2x80x9cTxe2x80x9d in it is distinguishable from a blue dot with a letter xe2x80x9cWxe2x80x9d. Some features have recognizable attributes. Some augmented reality tools have recognition capabilities.
Camera pose (position and orientation) can be computed when three or more calibrated features in an image are detected. Given the camera pose, computer generated objects can be mixed into the scene image observed by the user to create an augmented reality.
An augmented reality can be presented to an observer through an optical mixing of real and computer generated images or a mixing of real scene video with computer generated video. The augmented reality images may be still or moving images. Augmented reality images can be produced in real-time while a user is physically viewing a scene, augmented reality images can also be produced off-line from recorded images.
Prior art by the inventors of the present invention discloses the use of autocalibration for producing augmented reality images. The autocalibration method is used to increase the number of calibrated features in the scene from which to compute camera pose. Generally, with more calibrated features visible in an image, more reliable and accurate camera poses are computed. As the augmented reality system is used, autocalibration produces more and more calibrated features from which to compute pose.
Autocalibration is accomplished using natural features (NF) or intentional fiducials (IF) within the scenes being viewed by the camera. The natural feature and intentional fiducial are detected as points with 2D image positions. Autocalibration can be accomplished as the camera moves as follows:
1) Initially, the scene must contain at least three calibrated natural features or intentional fiducials. Remaining natural features and intentional fiducial are uncalibrated.
2) A user begins an augmented reality session by pointing the camera at the calibrated natural features or intentional fiducials, enabling the system to compute the camera pose.
3) The user moves the camera around, always keeping the calibrated natural features or intentional fiducials in view. Autocalibration occurs during this and subsequent camera motions. As the camera moves, the system computes camera pose from the calibrated features and detects and tracks uncalibrated natural features and/or intentional fiducial from different camera positions. Each view of a tracked natural feature or intentional fiducial contributes to an improved estimate of the feature""s 3D position.
4) Once a natural feature or intentional fiducial position estimate is known to an acceptable tolerance it becomes an autocalibrated feature (AF) and it is useable as an additional calibrated feature for estimating camera poses.
Autocalibration can be computed during on-line real-time augmented reality system use, or it can be computed during off-line video processing. The intent of autocalibration is to increase the number of calibrated features for tracking. This purpose does not rely upon any particular mathematical method of autocalibration. The prior art by the inventors of the present invention describes the use of several variants of Extended Kalman Filters for performing autocalibration. Other methods, such as shape from motion, are suitable as well. The end result of any method of autocalibration is an increased number of calibrated features in the scene to support pose tracking.
Autocalibration is computed by processing data from a start time forward for either on-line or off-line cases. Batch methods of autocalibration process data in time-forward and time-reverse order, but these methods are best suited to off-line video processing. Autocalibration can be done with any imaging system including nonplanar panoramic or panospheric projections.
In the prior art augmented reality devices, autocalibration has been used to sense and integrate new features into its calibration database for camera pose tracking. However, the prior art has not made full use of autocalibrated features and many problems have remained in the incorporation of autocalibration into a practical augmented reality system or tool. For example, the prior art has not used autocalibrated features for aligning models to the calibration coordinate system.
Also, during autocalibration, loss of 2D tracking, 3D correspondences, and camera pose tracking are common due to rapid motions, camera occlusions, or poor imaging conditions. Given an augmented reality tool without autocalibration, as the user moves the camera around the office, if calibrated features move out of view, pose tracking is lost. The user must redirect the camera towards the calibrated features for the augmented reality tool to recover pose tracking and produce annotations. Additionally, all objects in the scene that are to be annotated need 3 or more calibrated features near them. In practice, this is burdensome on the user and an impractical constraint upon the scene. Autocalibration was developed to overcome this hurdle. Given an augmented reality tool with autocalibration, as the user moves the camera to capture the scene, new features are detected and autocalibrated. These autocalibrated features are used by the augmented reality tool for pose estimation as described in the prior art. However, in the prior art, if the camera motions or viewing conditions cause autocalibration to fail or produce erroneous poses, both pose tracking and the accumulated set of autocalibrated features are lost. Pose tracking can only be recovered by repeating the initialization process. With the re-initialization for pose tracking, autocalibration must also restart since all 2D and 3D correspondences are lost for the accumulated set of autocalibrated features. The accumulated set of autocalibrated features are of no use and discarded, a new and likely different set of autocalibrated features must be rebuilt from scratch. The prior art has not used autocalibrated features for the subsequent recovery of 2D tracking, 3D correspondences, and camera pose.
Additionally, autocalibration by itself does not provide any mechanism for the recovery and reuse of 3D correspondences once features leave the screen or 2D correspondences are lost. In the prior art, pose tracking depends on both the detection of three or more features in the image and their 3D correspondences (if they exist). Both are prone to failure during typical use of an augmented reality tool. Even during optimal conditions, prior art descriptions of autocalibration only use the autocalibrated feature while they are on screen. For example, if a door corner enters the image as the augmented reality tool camera moves, it is autocalibrated and then used to compute pose as long as it remains in view. Once the camera moves beyond the door, all use of that door corner information is currently lost. Even if the camera returns to the door area, the detection and autocalibration of that same corner feature is repeated.
Also, even when autocalibrating structured sets of point features, the prior art autocalibration methods calibrate the points separately. This prior art method of autocalibrating structured sets of points is inefficient and fails to maintain the relative point relationships within the structured set.
A general object of the present invention is to extend the uses of autocalibration, beyond the camera pose tracking described in the prior art, to improve the reliability and efficiency of augmented reality systems. A more specific objective is to use autocalibrated features for aligning models to the calibration coordinate system. Another objective is to use autocalibrated features for recovery of 2D tracking, 3D correspondences, and camera pose. Another specific objective is to autocalibrate structured sets of point features together.