The goal of most computer vision systems is to allow a computer to analyze and understand the scenes in an image. Typically, an image is analyzed to find features in the image, which can be recognized by the computer vision system. Computer vision systems have typically been used for passive tasks such as mail sorting, tumor detection, parts identification, map making, and fingerprint matching. However, as the processing power of computer systems has improved interactive tasks using computer vision have been developed.
For example, computer vision can be used to determine the location and orientation of a user in the environment of an augmented reality system. In conventional augmented reality systems a 3D virtual environment is displayed to the user using for example a head mounted display or a monitor. The 3D virtual environment changes based on the actions of the user. Conventional augmented reality systems typically use a user interface device, such as a joystick or a mouse. However, if a camera is mounted on the user's head so that a computer vision system receives an image of what the user sees, the user's natural head movement can be used to control by the augmented reality system to determine the what part of the virtual environment should be shown to the user.
Specifically, one or more markers, called fiducials, are placed in the actual physical environment of the user. Depending on the application, the fiducials could be placed in predetermined locations, on movable objects, or on fixed objects. The computer vision system must detect the fiducials and then analyze the fiducials to determine relative location of the fiducial from the camera. Using the relative location of the fiducials, the augmented reality system can determine the location and orientation of the camera. Because the camera is mounted to the user's head, the location and orientation of the camera is the same as the location and orientation of the user's head. The location and orientation of an object (e.g. the camera or the user's head) is generally called the “pose” of the object.
The process of identifying, and analyzing the fiducial to determine the pose of the camera mounted on the user's head is very difficult. Furthermore, to provide adequate response to the user's head movement, the pose of the user's head must be computed in real time. Generally, each frame of the video captured by the camera must be processed to identify and analyze the fiducials to determine the pose of the camera on the user's head. Hence there is a need for a system and methods to efficiently process images having one or more fiducials to compute the pose of the camera capturing the image.