The field of computer vision includes the computer analysis of scenes projected into an electronic camera. The camera generates images of the scenes, and the computer analyzes these images and draws useful conclusions.
In particular, an active branch of computer vision is devoted to computing the position and orientation in space of an object, also called object pose, by detecting several features of the object, in a single image using a single camera, or in two images using two cameras.
Implementations using two cameras apply well-known stereometric techniques, in which the position of each feature in 3D can be obtained by triangulation from the positions of the projection of this feature in each of the two images. For more details on stereometric techniques, see the book titled "Robot Vision ", by Berthold K. P. Horn, MIT Press. This type of technique has several drawbacks. First, this system requires two cameras, which increases system cost. Second, calibrating the relative positions of the two cameras is difficult, and the system output is very sensitive to calibration errors. Third, generating the rotation matrix for an object requires lengthy trigonometric computations, and combining data from more than 3 object points requires matrix inversion computations. This results in increased hardware cost in situations where real time system response is needed.
In stereometric techniques the spatial position of each object feature is found individually, without making use of additional information such as the relative positions of the object features in space. If this information about relative positions is available, other techniques are preferable, because they can recover the pose of the object from a single image. For example, if 3 points of an object are detected in a single image and the distance between theses features in the object is known, it is possible to recover the pose of the object. However, a polynomial equation must be solved, and 2 or 4 solutions for the object pose are found. See for example "New Exact and Approximate Solutions of the Three-Point Perspective Problem ", by Daniel DeMenthon and Larry Davis, Pattern Analysis and Machine Intelligence, vol. 14, no. 11, November 1992, pp. 1100-1104. If more than 3 points are used, the solution is generally unique, but the formulas become more complicated, and would be practical only with costly hardware in real time use. See for example "An Analytical Solution for the Perspective-4-Point Problem", by Radu Horaud, Bernard Conio and Olivier Leboulleux, Computer Vision, Graphics, and Image Processing, vol. 47, pp. 33-44, 1989. One would like to choose 5 points or more to increase the reliability of the object pose results, but is faced with highly difficult mathematical computations.
An alterative approach that uses much simpler computations assumes well-known approximations to perspective projection, called orthographic projection and scaled orthographic projection. Scaled orthographic projection is an improved version of orthographic projection in which changes of scales due to the distance between the object and the camera are accounted for. For example, in U.S. Pat. No. 5,227,985, which is hereby incorporated by reference, contributed by the present inventor, a scaled orthographic projection approximation is applied. Consequently, only an approximated pose of an object is obtained from the positions of images of points of the object.
In contrast, according to this invention, the pose of the object can be obtained in a very accurate way while maintaining simplicity. This result can be obtained because the inventor has found a computationally inexpensive way to solve the exact equations characterizing a true perspective projection, thanks to an iterative approach. This approach involves performing the following simple operations:
(1) Compute correction factors accounting for the relative distances of feature points along the optical axis of the camera, PA1 (2) Create two image vectors depending on these correction factors and on the x and y coordinates of the projections of the point features in the image, PA1 (3) Multiply a precomputed object matrix (depending only on the relative positions of the points of the object) by the two image vectors, PA1 (4) Normalize the two resulting vectors to obtain the first two rows of a four dimensional pose matrix; PA1 (5) Complete the last two rows of the pose matrix using a cross-product; PA1 (6) Go back to operation (1), unless the correction factors have not changed from one iteration loop to the next; PA1 (a) To provide a system for accurately computing the pose of an object using images of light sources mounted on the object obtained by an electronic camera PA1 (b) To provide a system providing the pose of an object in a few iteration steps involving at each step the multiplication of a precomputed object matrix by two vectors and the normalization of the results; PA1 (c) To provide a system in which large motions of an operator are accurately monitored by a single camera to let the operator interactively modify views of a virtual scene or interact with virtual objects displayed on this scene; PA1 (d) To provide a system in which large motions of an operator are accurately monitored by a single camera to let the operator remotely control a teleoperated device.
At the first iteration loop, the correction factors accounting for the relative distances of feature points along the optical axis of the camera may be unknown, but in most applications these correction factors are fairly small. In this case, the correction factors are taken to be initially zero. However, the number of iteration loops required to converge to an accurate pose is reduced if good initial estimates are made for the correction factors. In applications involving the tracking of a moving object, a pose of the object may have been computed at a very recent prior time, and these correction factors may be roughly estimated using this prior pose estimate. Then two or three iterations are sufficient for convergence to a very accurate object pose. Many points can be used for the object for improved reliability without any changes in the steps above.
In common with U.S. Pat. No. 5,227,985, one embodiment of this invention is a system for measuring the motions of the head of an operator. A virtual scene of virtual objects presented to the eyes of the operator in head mounted displays is modified according to these measurements. The operator may want to observe a part of the virtual scene out of his present field of view, and the system detects the rotation of his head and generates on the head mounted displays the part of the virtual scene corresponding to the new field of view. In this specific application, accurate measurements of the head motions may be required in order to provide the operator's eyes with images that precisely match what he would expect to see from his motions; the present invention will yield more accurate results than the approximate method described in U.S. Pat. No. 5,227,985.
In another embodiment of this invention, also common with U.S. Pat. No. 5,227,985, the operator may hold a specially designed "mouse" in his hand. The system computes the motions of this object by the iterative computation disclosed in this specification and displays a corresponding virtual object in the virtual scene. This virtual object may be used as a pointing cursor and more generally as a tool to interact with the other virtual objects of the scenery. The prior art for this type of application is now examined.
In U.S. Pat. No. 4,891,630 to Friedman, 1990, entitled "Computer Vision System with Improved Object Orientation Technique", a system is described using a single camera for monitoring the head motion of an operator for eyetracking purposes. A camera takes images of a patch which is attached to the cheek of the operator. The patch has 4 small flat reflective elements at its corners and a large hemispheric reflective element at its center. Reflections of a light source on these elements are detected in images taken by the camera. Reflections from the small flat elements are point-like reflections from locations which are fixed with respect to the patch, whereas reflections from the surface of the large hemispheric element may come from various locations on this surface, depending on the orientation of the patch. Therefore, when the operator moves his head, these reflections move differently in the image whether they come from the flat elements or from the hemispherical element, and formulas for head angle changes using these reflection differences are provided. However these formulations can provide only qualitative angle changes, and are valid only for very small angle changes. They may be sufficient for the specific application described in that patent, but would provide incorrect results if they were applied to tracking the large displacements of an object held in the hand of an operator, or to tracking the large rotations of the head of an operator exploring a virtual scene. In contrast, the apparatus in the present disclosure gives correct results for large displacements of an object.
An example of display cursor control by optical techniques is presented in U.S. Pat. No. 4,565,999 to King et al., 1986, entitled "Light Pencil". A device fixed to the head of the operator comprises 4 light emitting diodes (LEDs). A photodetector placed above the computer display senses the variations of intensity of the LEDs and a processor relates these variations to changes in orientation of the LEDs with respect to the photodetector. However, this system is intended for the control of horizontal displacement of a cursor on the display by the operator's vertical and horizontal rotations. It does not provide a way to detect other motions such as translations or roll, and therefore cannot be applied to the general pose monitoring of objects.