1. Field of the Invention
The invention relates generally to human/computer visual interfacing and more particularly to a three-dimensional, visual human/computer interface for interactive "augmented reality" applications.
2. Description of the Related Art
"Augmented reality" (AR) refers to a human/computer interaction in which synthetic, computer generated elements are mixed or juxtaposed with real world elements in such a way that the synthetic elements appear to be part of the real world. For example, computer generated graphic elements can be displayed on a partially transparent/partially reflective helmet or visor viewer so that the human sees real objects (through the visor) which appear to be mixed with computer generated graphics (projected by reflection from the inside of the visor). Alternatively, video imagery of real objects can be combined with computer generated graphics and the combination displayed on a conventional or stereoscopic video monitor. Such AR techniques offer an extremely useful human computer interface in numerous applications. Invisible features of a real object can be displayed as wire-frame graphics to indicate the internal structure of the object. This technique is useful, for example, to guide a surgeon in performing an intricate procedure, or to guide a mechanic in repairing a complex device. Invisible topographical features can be displayed to guide a pilot or navigator through a complex three dimensional terrain. Imaginary or potential features can be three-dimensionally and interactively displayed to an architectural or landscape designer. Many other educational, commercial, and entertainment applications are possible.
A central problem in AR is to align graphical information with an image of a real object. This is sometimes referred to as a "registration" problem. For example, on a video display, computer generated graphics should ideally be positioned in apparent registration relative to a video image of the corresponding real object. On a see-through display, the computer graphics should be positioned so as to appear registered with the external object being viewed, thereby achieving the illusion of reality. In either case, to achieve registration, the position and orientation of the viewer relative to the object must be found. This position and orientation information allows a computer to correctly render the graphical overlay as seen from the perspective of the camera or viewer. If the graphical interface is to be useful, the registration between the real world object and the computer generated graphics must be dynamically updated at a rate sufficient to maintain registration despite expected movements of the object or the observer. For example, in one augmented reality application a mechanic wearing a helmet mounted camera and a see through visor display system simultaneously views an engine and computer graphics emphasizing and identifying features of the same engine. It is most effective if the computer graphics are accurately registered with the real engine notwithstanding routine motions and changes of viewpoint of the mechanic. The moving mechanic will perceive a subjective sense of the display's reality only if the registration is dynamically accurate and responsive.
Two general approaches to the registration problem for AR have been attempted: (1) object pose estimation methods, and (2) observer pose estimation methods. In the former, the approach is to determine the position and pose of the object using either passive or active methods. Once this information is available, computer graphics are rendered to concur with the known position and pose of the object. In the latter approach, instead of determining the position and orientation of the object, the position and orientation of the observer or camera is determined. The computer graphics are then transformed to appear registered with the object given the determined position and orientation of the observer.
Object Pose Estimation Methods
The wearable computing project at Massachusetts Institute of Technology is described on the world wide web at:
http://wearables.www.media.mit.edu/ojects/wearables/augmented-reality.html)
In this project, three LEDs (light emitting diodes) are placed, with known distances between them, on an object. Using a camera of known focal length the position and orientation of a plane containing the LEDs is then determined. One limitation of this method is that the face of the plane with the LEDs must always be visible to the camera or viewer. Furthermore, errors in the estimation of position and orientation of the plane of the LEDs manifest as registration errors, requiring secondary means to correct.
A similar approach has been attempted by researchers at University of Southern California, based on a pose determination scheme developed by M. A. Fischler and R. C. Bolles, "Random Sample Consensus: A paradigm for model fitting with applications to image analysis and automated cartography," Graphics and Image Processing, 24 (6), pp. 381-395, 1981. Their method involves solving a quadratic polynomial. Ambiguities are resolved by choosing the solution closest to that in the previous frame. This approach has disadvantages similar to that of the MIT group previously discussed.
Another method, developed at Carnegie Mellon University (CMU) and denoted "magic eye," uses a robust template matching procedure to detect features. See Uenohara and Kanade, "Vision-Based Object Registration for Real-time Image Overlay," in Proceedings 1.sup.st International Conference on Computer Vision, Virtual Reality and Robotics in Medicine (1995). The position and surrounding surface orientation of selected features and object coordinates is assumed to be known. A geometric invariant is used to assure proper correspondence of feature points during tracking. The invariant is also used to encode the position of graphical overlays relative to the feature points. This method requires that each graphic overlay be positioned such that there are four feature points around it in order to apply the geometric invariant method. This imposes limitations on the graphic information which is presentable.
At University of Rochester, K. Kutulakos and J. Vallino have demonstrated a system based on determining an affine coordinate system in a live video stream using markers. See K. Kutulakos and J. Vallino, "Affine object representations for Calibration-free Augmented Reality," in Proc. IEEE Virtual Reality Annual Symposium (1996). The graphic objects are projected in the affine coordinate system before being overlaid on a video stream. By tracking markers, the affine coordinate system is adjusted to correspond to the orientation of the object with the markers. The affine coordinates indirectly maintain registration between the real object and the graphics. This system is functional but computationally demanding.
Observer Pose Estimation Methods
Grimson et al. have developed methods to view previously imaged and reconstructed MRI and CT data superimposed on live video signals of a patient in an operating room. Grimson, W. E. L., Ettinger, G. J., White, S. J. m Lozano-Perez, T., Wells III, W. M., and Kikinis, R. "An automatic registration method for frameless stereotaxy, image guided surgery, and enhanced reality visualization," In IEEE Transactions on Medical Imaging, Vol. 15, no. 2, pp. 129-140 (1996). The registration is based on least squares minimization of distance between the image data and 3-D model, with the 3-D model data obtained by scanning with a laser range finder. The pose of the camera is determined from this minimization procedure. This method is computationally very demanding and also requires extensive hardware (laser range finder and marker projectors) for the data acquisition.
Another approach has been to track the position and orientation of the observer's head using active tracking devices, for example with a magnetic field based tracking device and/or an ultrasound based device. e.g., Webster, Anthony; Feiner, Steven; MacIntyre, Blair; Massie, William; and Krueger, Theodore, "Augmented Reality in architectural construction, inspection, and renovation," in Computing in Civil Engineering, pp. 913-919 (1996). The visual display is then continuously modified using the active tracking information to give the impression that the two-dimensional visual display is overlaid on the three-dimensional environment. The use of magnetic and/or ultrasonic tracking devices constrains the user to a limited area of mobility and is subject to distortions.
In a similar approach at the University of North Carolina at Chapel Hill, AR researchers developed a system for displaying ultrasound images directly on the image of the patient. The registration technique is based on simultaneous tracking of the user's head using magnetic sensors and the earth's magnetic field in combination with stereo cameras. Concentric colored circles are used as features for the visual tracking. Three feature points are required to determine the head pose, by stereo triangulation of the three feature points. In the absence of at least three visual features, however, the magnetic tracking contributes more to the pose estimation. When sufficient visual features are available, accuracy increases.
Hoff et al. at the Colorado School of Mines have developed another observer pose determination method based on concentric circle markers. See Hoff, W. A.; Lyon, T. and Nguyen,K., "Computer Vision-Based Registration Techniques for Augmented Reality," Proc. of Intelligent Robots and Computer Vision XV, Vol. 2904, in Intelligent Systems and Advanced Manufacturing, SPIE, Boston, Mass., pp. 538-548 (1996). By processing a video image of the object with the markers they isolate the markers. They then use an estimation algorithm to estimate the pose of the camera.
Koller et al. at California Institute of Technology in Pasadena have also demonstrated a camera-motion estimation based approach. Using a linear acceleration model for the camera motion, they use Kalman filtering techniques to perform predictive tracking of rectangular markers and determine the motion of the camera. This method is somewhat computationally demanding, which limits the speed of operation. See Koller, D., Klinker, G.; Rose, E; Breen, D.; Whitaker, R.; and Tuceryan, M., "Real-time Vision Based Camera Tracking for Augmented Reality Applications," Proceedings of the ACM Symposium on Virtual Reality Software and Technology, pp. 87-94 (1997).
These and other methods have all attempted to solve the registration problem in AR. However, to date all the previous methods have been in various degrees limited by the computational speed available or the need for cumbersome position and/or orientation sensors.