An object's pose in a real three-dimensional environment can be expressed with respect to stationary references such as ground planes, reference surfaces, lines, solids, fixed points and other invariant features disposed in the real three-dimensional environment. It is convenient to parameterize the environment by a set of world coordinates with a chosen reference point. The reference point may be the origin of the world coordinates, the center of a particularly prominent invariant feature or the center of a distribution of two or more of these features. Once the locations and orientations of the invariant features distributed in the environment are known, then knowledge of the spatial relationship between the object and these invariant features enables one to compute the object's pose.
An object's pose information combines the three linear displacement coordinates (x,y,z) of any reference point on the object, as well as the three inclination angles, also called the Euler angles (ϕ,θ,ψ) that describe the pitch, yaw and roll of the object. Conveniently, all parameters (x,y,z,ϕ,θ,ψ) are expressed in world coordinates to yield an absolute pose. In some cases, alternative expressions for the inclination angles such as rotations defined by the four Caylyle-Klein angles or quaternions are more appropriate.
Determination of a sequence of an object's absolute poses at different times allows one to compute and track the motion of the object in the real three-dimensional environment. Over time, many useful coordinate systems and method have been developed to track the pose of objects and to parametrize their equations of motion. For a theoretical background the reader is referred to textbooks on classical mechanics such as Goldstein et al., Classical Mechanics, 3rd Edition, Addison Wesley 2002.
Optical navigation is a particularly simple and precise way to track moving objects. The approach is also intuitive since our own human vision system computes locations and motion trajectories of objects in real three-dimensional environments. The precision of optical navigation is due to the very short wavelength of electromagnetic radiation in comparison with typical object dimensions, negligible latency in short distance measurements due to the extremely large speed of light as well as relative immunity to interference. Thus, it is well known that the problem of determining an absolute pose or a motion trajectory of an object in almost any real three-dimensional environment may be effectively addressed by the application of optical apparatus and methods.
A particularly acute need for efficient, accurate and low-cost determination of the absolute pose of an object in a real three-dimensional environment is found in the field of hand-held objects used for interfacing with the digital world. This field encompasses myriads of manipulated objects such as pointers, wands, remote controls, gaming objects, jotting implements, surgical implements, three-dimensional digitizers and various types of human utensils whose motion in real space is to be processed to derive a digital input for an application. In some realms, such application involves interactions that would greatly benefit from a rapid, low-cost method and apparatus for one-to-one motion mapping between real space and cyberspace.
Specific examples of cyberspace games played in three-dimensions (3-D) and requiring high-precision control object tracking involve scenarios where the manipulated control object is transported into or even mimicked in cyberspace. Exemplary gaming objects of this variety include a golfing club, a racket, a guitar, a gun, a ball, a steering wheel, a flying control or any other accoutrement that the player wishes to transport into and utilize in a cyberspace application. A very thorough summary of such 3-D interfacing needs for graphics are found in U.S. Pat. No. 6,811,489 to Shimizu, et al.
A major problem encountered by state of the art manipulated objects such as control wands and gaming implements is that they do not possess a sufficiently robust and rapid absolute pose determination system. In fact, many do not even provide for absolute pose determination. Rather, they function much like quasi three-dimensional mice. These solutions use motion detection components that rely on optical flow sensors, inertial sensing devices or other relative motion capture systems to derive the signals for interfacing with cyberspace. In particular, many of such interface devices try to solve just a subset of the motion changes, e.g., inclination. An example of an inclination calculation apparatus is found in U.S. Pat. No. 7,379,841 to Ohta while a broader attempt at determining relative motion is taught in U.S. Pat. No. 7,424,388 to Sato and U.S. Application 2007/0049374 to Ikeda, et al.
Unfortunately, one-to-one motion mapping between space and cyberspace is not possible without the ability to digitize the absolute pose of the manipulated object with respect to a well-defined reference location in real space. All prior art devices that do not solve the full motion problem, i.e., do not capture successive poses of the manipulated object with a method that accounts for all six degrees of freedom (namely, the very parameters (x,y,z,φ,θ,ψ) inherent in three-dimensional space) encounter limitations. Among many others, these include information loss, appearance of an offset, position aliasing, gradual drift and accumulating position error.
In general, the prior art has recognized the need for tracking all six degrees of freedom of objects moving in three-dimensions. Thus, optical navigation typically employs several cameras to determine the position or trajectory of an object in an environment by studying images of the object in the environment. Such optical capturing or tracking systems are commonly referred to as optical motion capture (MC) systems. In general, motion capture tends to be computationally expensive because of significant image pre- and post-processing requirements, as well as additional computation associated with segmentation and implementation of algorithms. One particular system taught by McSheery et al. in U.S. Pat. No. 6,324,296 discloses a distributed-processing motion capture system that employs a number of light point devices as markers, e.g., infrared LEDs, attached to the object whose motion is to be determined. The markers use unique sequences of light pulses to represent their unique identities and thus enable filtering out of information not belonging to the markers (i.e., background noise) by the imaging cameras located in the environment. Since McSheery's system permits a great deal of irrelevant information from the imaging sensors (e.g., CCDs) to be discarded before image processing, the system is less computationally expensive than more traditional motion capture systems.
Another three-dimensional position and orientation sensing system that employs markers on the object is taught by Kosaka et al. in U.S. Pat. No. 6,724,930. In this case the markers are uniquely identified based on color or a geometric characteristic of the markers in the extracted regions. The system uses an image acquisition unit or camera positioned in the environment and relies on image processing functions to remove texture and noise. Segmentation algorithms are used to extract markers from images and to determine the three-dimensional position and orientation of the object with respect to the image acquisition apparatus.
Still another way of employing markers in position and orientation detection is taught in U.S. Pat. No. 6,587,809 by Majoe. The object is tracked by providing it with markers that are activated one at a time and sensed by a number of individual sensors positioned in the environment. The position of the energized or active marker is determined by a control unit based on energy levels received by the individual sensors from that marker.
The above approaches using markers on objects and cameras in the environment to recover object position, orientation or trajectory are still too resource-intensive for low-cost and low-bandwidth applications. This is due to the large bandwidth needed to transmit image data captured by cameras, the computational cost to the host computer associated with processing image data, and the data network complexity due to the spatially complicated distribution of equipment (i.e., placement and coordination of several cameras in the environment with the central processing unit and overall system synchronization).
Despite the above-mentioned limitations of general motion tracking systems, some aspects of these systems have been adapted in the field of manipulated objects used for interfacing with computers. Such objects are moved by users in three-dimensions to produce input for computer applications. Hence, they need to be tracked in all six degrees of freedom. Therefore, recent three-dimensional wands and controls do teach solving for all six degrees of freedom.
For example, U.S. Patent Application 2008/0167818 to Kimber et al. has a passive wand with no on-board devices or LEDs. The wand is viewed from multiple cameras finding the full 6 degrees of freedom to provide for more precise estimation of wand pose is expressly taught. Similarly, U.S. Pat. No. 6,982,697 to Wilson et al. teaches the use of external calibrated cameras to decode the orientation of the pointer used for control actions. U.S. Patent Application 2006/0109245 to Wilson, et al. further teaches how intelligent computing environments can take advantage of a device that provides orientation data in relative motion mode and absolute mode. Further teachings on systems that use external or not-on-board cameras to determine the pose and motion of a wand or control and use it as input into various types of applications can be found in U.S. Patent Applications: 2008/0192007, 2008/0192070, 2008/0204411, 2009/0164952 all by Wilson.
Still other notable teachings show as few as a single off-board camera for detecting three-dimensional motion of a controller employed for game control purposes. Such cameras may be depth sensing. Examples of corresponding teachings are found in U.S. Patent Application 2008/0096654 by Mondesir, et al., as well as U.S. Patent Applications 2008/0100825, 2009/0122146 both by Zalewski, et al.
Unfortunately, approaches in which multiple cameras are set up at different locations in the three-dimensional environment to enable stereo vision defy low-cost implementation. These solutions also require extensive calibration and synchronization of the cameras. Meanwhile, the use of expensive single cameras with depth sensing does not provide for robust systems. The resolution of such systems tends to be lower than desired, especially when the user is executing rapid and intricate movements with the manipulated object in a confined or close-range environment.
Another approach involves determining the position or attitude of a three-dimensional object in the absolute sense and using it for a graphical user interface. One example of this approach is taught in U.S. Pat. No. 6,727,885 to Ishino, et al. Here the sensor is on-board the manipulated object. A projected image viewed by the sensor and generated by a separate mechanism, i.e., a projection apparatus that imbues the projected image with characteristic image points is employed to perform the computation. Additional information about such apparatus and its application for games is found in U.S. Pat. No. 6,852,032 to Ishino and U.S. Pat. No. 6,993,206 to Ishino, et al.
The solution proposed by Ishino et al. is more versatile than the prior art solutions relying on hard-to-calibrate and synchronize multi-camera systems or expensive cameras with depth sensing capabilities. Unfortunately, the complexity of additional hardware for projecting images with characteristic image points is nontrivial. The same is true of consequent calibration and interaction problems, including knowledge of the exact location of the image in three-dimensional space. This solution is not applicable to close-range and/or confined environments, and especially environments with typical obstructions that interfere with line-of-sight conditions.
There are still other teachings attempting to improve on both the apparatus and method aspects of generating computer input with manipulated objects such as wands, pointers, remote controls (e.g., TV controls). A very illuminating overall review of state of the art technologies that can be used for interacting with virtual environments and their limitations are discussed by Richard Halloway in “Virtual Environments: A Survey of the Technology”, University of North Carolina at Chapel Hill, September 1993 (TR93-033). Still more recent teachings focusing on how absolute pose data can be used in specific contexts and for remote control applications is discussed in the following U.S. Patent Applications: 2007/0189737; 2008/0106517; 2008/0121782; 2008/0272272; 2008/0309511; 2009/0066647; 2009/0066648; 2009/0153389; 2009/0153475; 2009/0153478; 2009/0158203 and 2009/0158222.
In sum, despite considerable amount of work in the field, a clear and pressing need for low-cost, robust and accurate apparatus for absolute motion capture remains. Specifically, what is needed is an apparatus that permits one to obtain absolute pose data from manipulated object for purposes of interacting with the digital world. Such apparatus should not only be low-cost, robust and accurate, but it should also be convenient and easy to use at high frame rates in close-range and confined three-dimensional environments.