There is presently considerable interest in systems able to track the position and orientation of an object in space. In particular, these systems are being applied to virtual reality technology, in which the user interacts with a computer by moving a hand or the head to control a computer-generated world. The paper "A Survey of Position Trackers", by K. Meyer, H. Applewhite and F. Biocca, in Presence, vol. 1, number 2, Spring 1992, pp. 173-200, MIT Press, provides a survey of such technologies. Position tracking has been implemented using four different approaches: electro-optical, mechanical, magnetic, and acoustic. Electro-optical position trackers have received more attention than the other systems. They typically use video cameras to detect bright points of the tracked object, and compute, from the locations of these points in video frames, the six degrees of freedom of the object. For example, this paper mentions an electro-optical head position sensor described in U.S. Pat. No. 4,956,794 to Zeevi et al., entitled "Single Camera Three Dimensional Head Position Sensing System". At least three cues able to reflect light from a separate source are mounted along an imaginary circle around the head of the operator, and are detected in the video stream from a single camera. The locations of the images of these cues are claimed to contain enough information to provide indication of head rotation and translation at video frame rate. A close examination reveals, however, that the electronic circuitry disclosed in Zeevi is designed to detect only the rising edges created by the images of these cues in the video signal. Such an approach is only useful if the cues are relatively far from the camera and are seen as very small dots. When the cues are close to the camera, it would be more desirable to compute the centroids of the images of the cues. This computation would require detecting both the rising edges and the falling edges created by images of bright spots in the video signal, and integrating the information about such edges from all the rows of the video image occupied by the same bright spots. By detecting only rising edges in the video signal, a system as taught by Zeevi et al. cannot accurately detect the positions of the images of the cues when they are relatively close to the camera.
In U.S. Pat. No. 4,672,562, issued to Egli et al., entitled "Method and Apparatus for Determining Location and Orientation of Objects", a method and apparatus are taught in which target points are mounted along orthogonal lines on the object, and the coordinates of the image spots created by these target points on the image plane of the camera are detected on the image plane. Computations using these coordinates provide spatial information about the object. However, Egli et al. do not teach any of the hardware requirements for detecting these image spots.
In U.S. patent application Ser. No. 07/998470 and U.S. Pat. No. 5,227,985 disclosed by one of the present inventors, systems are described which use a single camera with at least four light sources in any noncoplanar arrangement mounted on the object. The systems are able to compute the position and orientation of the object from the bright spots created by the light sources in the image of the camera with very simple computing steps even when the number of light sources is large and in a complex arrangement. This task requires digitizing analog video data from an NTSC video signal, and grouping contiguous bright pixels in order to find the centers of the bright spots which are the projections of the light sources in the image. In order to accurately represent these bright spots, around 256 digital pixels must be obtained for each image row. Since each image row is transmitted out of the camera in around 50 .mu.s of NTSC signal, a new pixel has to be digitized approximately every 200 nanoseconds. Instructions for a typical inexpensive microprocessor running at 33 MHz takes from 300 to 600 nanoseconds, therefore there is not enough time for such a microprocessor, while the pixels are being digitized, to find the strings of bright pixels in an image row. Finding the strings of bright pixels while the pixels are being digitized would require relatively expensive hardware. The present invention teaches how to implement a system which provides the desired output with inexpensive components, by delaying the search for such strings of bright pixels to the end of each image row, or to the end of each image field--during the 1200 .mu.s of vertical retrace between 2 image fields.