The position of a point in 3-D space can be described by three coordinates x, y, and z. The orientation of a 3-D object is described by three additional coordinates, roll, pitch, and yaw. Roll, pitch, and yaw are measured relative to some set of axes, frequently North, East, and Down along gravity, but any fixed axes can be used. A rigid body in 3-D space thus requires six coordinates for full description of its pose (position and orientation). Tracking the complete pose of a rigid object is referred to as 6-degree-of-freedom (6-DOF) tracking.
It is well-known in the field of optical tracking systems that the position of a point-like marker (such a light-emitting diode (LED) or a reflective ball or dot) can be obtained by triangulating it with two or more cameras. The majority of optical tracking systems on the market require two or more cameras installed in the workspace with overlapping fields-of-view in order to track the position of a set of markers. To provide full 6-DOF tracking of an object, it is necessary to install at least 3 position markers on the object in a triangular shape, from which it is possible to work out the object's orientation by solving the so-called “exterior orientation problem.” This triangle needs to have sufficient extent in both width and length to achieve the desired orientation accuracy, and so it can become cumbersome to mount on a slender object such as a pen, a surgical tool, or any other object with no convenient large flat surfaces.
In tracking systems, a camera is an example of a 2-D bearing-angle sensor. When it measures the centroid of a target or marker, it returns two values, usually called u and v, which relate to the horizontal and vertical displacement, respectively, of the target in the image. These measurements are related to the azimuth and elevation angles, also known as bearing angles, from the sensor to the target. The relationship is non-linear: for a simple pinhole camera model, u and v are proportional to sin (azimuth) and sin (elevation), while for cameras with lenses the distortion makes the relationship more complex. However, in either case the camera outputs are isomorphic to bearing-angles and the camera belongs to the class of 2D bearing-angle sensors.
There are many other bearing-angle sensors that have been or could be used in optical motion tracking. Some examples include a quadcell, a lateral-effect photodiode, a position-sensitive device (PSD), a projection sensor (e.g., Hamamatsu S9132), or a laser-scanner which sweeps a fan of light through a space and measures the bearing angle to a photosensor target based on the timing of the detected pulse during the sweep. Also, two single-axis bearing sensors, for example, implemented with 1-D CCD or CMOS array sensors or single-axis laser scanners, may be combined in one housing to form a 2D bearing-angle sensor. Methods other than optical imaging can also be used to form a 2D bearing sensor device. Radio frequency (RF) and acoustic techniques, including swept radar or sonar beams, time-difference-of-arrival (TDOA), and phased arrays of antennas or microphones have all been used to measure bearing angles. For the remainder of this description we will use the term “camera” to mean any device capable of measuring two bearing angles.
A tracking system which uses cameras is referred as an “optical tracking system,” while a system using both optical and inertial sensors is referred as a “hybrid optical inertial tracking system” or just “hybrid tracking system.” Most optical and hybrid tracking systems require some environmental installation, i.e., some sort of markers attached to the tracked objects or, if cameras are attached to the tracked objects, then markers are installed in the environment.
A variety of items can be used as markers. Examples include printed fiducial patterns, retroreflective 2-D and 3-D targets, active LEDs in the visible, IR or UV spectrum, colored marks, and natural features on an object such as comers, lines or textures which are recognizable by computer vision or pattern matching techniques. Depending on the type of the markers, they may have different physical characteristics such as size, shape, color, etc. We will use the terms marker, target, fiducial, point or LED interchangeably to mean any type of local feature on an object which can be detected by a camera, and for which the camera can measure the bearings to a specific point in the feature which we shall call the centroid (even though, for some features such as corners or textures, the measured point is not actually a centroid).
Previous optical trackers which are capable of measuring the 6-DOF motion of an object require at a minimum either:                a) two cameras viewing three target points on a rigid triangle (stereo triangulate each point in 3D then solve for pose), or        b) one camera viewing four or more target points on a rigid structure (solve perspective n-point pose recovery algorithm from analytic photogrammetry). It is also possible to solve a 3-point pose recovery algorithm, but it produces four ambiguous solutions.        