When an item moves without constraints in a three-dimensional environment with respect to stationary objects, knowledge of the item's distance from and inclination to these objects can be used to derive a variety of the item's parameters of motion as well as its pose. Particularly useful stationary objects for pose recovery purposes include a ground plane, fixed points, lines, reference surfaces and other known features.
Over time, many useful coordinate systems and methods have been developed to parameterize stable reference frames defined by stationary objects. The pose of the item, as recovered and expressed in such stable frames with parameters obtained from the corresponding coordinate description of the frame, is frequently referred to as the item's absolute pose. Based on the most up-to-date science, we know that no absolute or stationary frame is available for defining truly absolute parameters. A stable frame is thus not to be construed to imply a stationary frame. More precisely stated, the stable frame in which the absolute pose is parameterized is typically not a stationary or even an inertial frame (for example, a reference frame defined on the Earth's surface is certainly stable, but not stationary and non-inertial due to gravity and Earth's rotation). Nevertheless, we shall refer to poses defined in stable frames as “absolute” in adherence to convention.
Many conventions have also been devised to track temporal changes in absolute pose of the item as it undergoes motion in the three-dimensional environment. Certain types of motion in three dimensions can be fully described by corresponding equations of motion (e.g., orbital motion, simple harmonic motion, parabolic motion, curvilinear motion, etc.). These equations of motion are typically expressed in the stable frame defined by the stationary objects.
The parameterization of stable frames is usually dictated by the symmetry of the situation and overall type of motion. For example, motion exhibiting spherical symmetry is usually described in spherical coordinates, motion exhibiting cylindrical symmetry in cylindrical coordinates and generally linear motion in Cartesian coordinates. More advanced situations may even be expressed in coordinates using other types of parameterizations, e.g., sets of linearly independent axes.
Unconstrained motion of items in many three-dimensional environments, however, may not lend itself to a simple description in terms of equations of motion. Instead, the best approach is to recover a time sequence of the item's absolute poses and reconstruct the motion from them. For a theoretical background, the reader is referred to textbooks on classical mechanics and, more specifically, to chapters addressing various types of rigid body motion. An excellent overall review is found in H. Goldstein et al., Classical Mechanics, 3rd Edition, Addison Wesley Publishing, 2002.
Items associated with human users, e.g., items that are manipulated or worn by such users, generally do not move in ways that can be described by simple equations of motion. That is because human users exercise their own will in moving such items in whatever real three-dimensional environment they find themselves. It is, however, precisely the three-dimensional motion of such items that is very useful to capture and describe. That is because such motion may communicate the desires and intentions of the human user. These desires and intentions, as expressed by corresponding movements of the item (e.g., gestures performed with the item), can form the basis for user input and interactions with the digital domain (e.g., data input or control input).
In one specific field, it is important to know the absolute pose of an item associated with a human user to derive the position of its tip while it contacts a plane surface. Such position represents a subset of the absolute pose information. Various types of items, such as elongate objects, can benefit from knowledge of their pose, which includes the position of their tip. More precisely, such items would benefit from knowing the absolute position (in world coordinates parameterizing the stable frame) of their tip while it is in contact with a plane surface embedded in the three-dimensional environment. These items include walking canes when in touch with the ground, pointers when in touch with a display or projection surface, writing devices when in touch with a writing surface, and styluses when in touch with an input screen.
The need to determine the absolute position of the tip or nib is deeply felt in the field of input devices such as pens and styluses. Here, the absolute position of the tip has to be known in order to analyze the information written or traced by the user on the writing surface. Numerous teachings of pens and related input devices providing relative tip position and absolute tip position are discussed in the prior art. Some of these teachings rely on inertial navigation devices including gyroscopes and accelerometers as described in U.S. Pat. Nos. 6,492,981; 6,212,296; 6,181,329; 5,981,884; 5,902,968. Others combine inertial navigation with force sensing as described in U.S. Pat. Nos. 6,081,261; 5,434,371. Still other techniques rely on triangulation using signal receivers and auxiliary devices on or adjacent to the writing surface as found in U.S. Pat. Nos. 6,177,927; 6,124,847; 6,104,387; 6,100,877; 5,977,958 and 5,484,966. Furthermore, various forms of radiation including short radio-frequency (RF) pulses, infra-red (IR) pulses, and even sound waves in the form of ultrasound pulses have been taught for triangulation and related techniques. A few examples of yet another set of solutions employing digitizers or tablets are discussed in U.S. Pat. Nos. 6,050,490; 5,750,939; 4,471,162.
The prior art also addresses the use of optical systems to provide relative, and in some cases, absolute position of the tip of a pen or stylus on a surface. For example, U.S. Pat. No. 6,153,836 teaches emitting two light beams from the stylus to two receivers that determine angles with respect to a two-dimensional coordinate system defined within the surface. The tip position of the stylus is found with the aid of these angles and knowledge of the location of the receivers. U.S. Pat. No. 6,044,165 teaches integration of force sensing at the tip of the pen with an optical imaging system having a camera positioned in the world coordinates and looking at the pen and paper. Still other teachings use optical systems observing the tip of the pen and its vicinity. These teachings include, among others, U.S. Pat. Nos. 6,031,936; 5,960,124; 5,850,058. According to another approach, the disclosure in U.S. Pat. No. 5,103,486 proposes using an optical ballpoint in the pen. More recently, optical systems using a light source directing light at paper have been taught, e.g., as described in U.S. Pat. Nos. 6,650,320; 6,592,039 as well as WO 00217222 and U.S. Pat. Appl. Nos. 2003-0106985; 2002-0048404.
In some prior art approaches the writing surface is provided with special markings that the optical system can recognize. Some early examples of pens using special markings on the writing surface include U.S. Pat. Nos. 5,661,506; 5,652,412. More recently, such approach has been taught in U.S. Pat. Appl. 2003-0107558 and related literature. For still further references, the reader is referred to U.S. patent application Ser. Nos. 10/640,942 and 10/745,371 and the references cited therein.
Most of the prior art approaches listed above are limited in that they yield relative position of the tip on the writing surface. Tablets and digitizers obtain absolute position but they are bulky and inconvenient. Of the approaches that provide absolute position of the tip without tablets by using optical systems, most rely on observing the relationship of markings provided on the writing surface to the tip of the pen. This approach is limiting in that it requires a specially-marked writing surface, which acts as a quasi-tablet.
In addition to being cumbersome, state-of-the-art pens and styluses employing optical systems usually generate a limited data set. In fact, most only recover and provide data corresponding to the trace traversed on the writing surface. Meanwhile, there are many applications that could benefit from a rich stream of data from the pen or stylus afforded by the full absolute pose parameterized in coordinates describing the stable frame. Furthermore, the absolute pose of such items when not in touch with a surface, as described in the prior application Ser. No. 10/769,848, also provides useful information. Indeed, there exists a much larger set of items, including pointers, absolute 3D mice, wands, remote controls, gaming objects and many others that would greatly expand their input capabilities if their full absolute pose parameters were made available.
The rich stream of information expressing an item's absolute pose combines its three linear or translational degrees of freedom with its three rotational degrees of freedom. Typically, translations are measured along linearly independent axes such as the X, Y, and Z-axes. The translation or displacement along these axes is usually measured by the position (x, y, z) of a reference point on the item (e.g., the center of mass of the item). The three-dimensional orientation of the item is typically expressed by rotations taken around three linearly independent axes. The latter are typically expressed with three rotation angles, such as the Euler angles (φ, θ, ψ).
Conveniently, absolute pose can be expressed with all six absolute pose parameters (x, y, z, φ, θ, ψ) in the world coordinates laid down in the stable frame. In some cases, alternative expressions for the rotation angles such as the three Tait-Bryan angles, the pitch yaw and roll angles, the four Cayley-Klein angles or quaternions are more appropriate. One can also use direction cosines or other alternatives for expressing the three rotational degrees of freedom of the item.
Optical methods for recovering the absolute pose of items endowed with on-board camera units are particularly simple and precise. These approaches are used in computer vision and robotics. They rely on algorithms that recover the camera's pose (optical pose estimation and recovery) in the three-dimensional environment from various optical inputs. Since the camera is affixed to the item, recovery of camera pose is tantamount to the recovery of the item's pose.
The optical approach to pose recovery is also intuitive, since our own human vision system computes locations and motion trajectories of items in real three-dimensional environments in that manner. This includes recovery of our own pose and movement in a three-dimensional environment based on images provided by our eyes. In other words, our own senses implement pose recovery algorithms from images. These abilities develop as part of our natural proprioception in early childhood.
The high accuracy and precision of optical navigation is due in large part to the very short wavelength of electromagnetic radiation in comparison with typical dimensions of objects and items of interest. Furthermore, radiation incurs negligible latency in short distance measurements due to the extremely large speed of light as well as its relative immunity to interference. Thus, it is well known that the problem of determining an absolute pose or a motion trajectory of an item in almost any real three-dimensional environment may be effectively addressed by the application of optical apparatus and methods.
A particularly acute need for efficient, accurate and low-cost determination of the absolute pose of an item in a real three-dimensional environment is found in the field of items associated with a human user. Such items may be held and manipulated by the user. Alternatively, they may be worn by the user. In either case, the items are intended to help the user interact with the digital world. Such items encompass myriads of manipulated objects such as pointers, wands, remote controls, gaming objects, jotting implements, surgical implements, three-dimensional digitizers and various types of human utensils whose motion in real space is to be processed to derive a digital input for an application. In some realms, such application involves interactions that would greatly benefit from a rapid, low-cost method and apparatus for motion mapping between real space and a cyberspace.
Specific examples of cyberspace games played in three-dimensions (3D) and requiring high-accuracy tracking of control items involve scenarios where the item is transported into or even mimicked in cyberspace. Exemplary gaming objects of this variety include a gun, a golf club, a racket, a guitar, a ball, a steering wheel, a flying control or any other accoutrement that the player wishes to transport into and utilize in a cyberspace application. A very thorough summary of such 3D interface needs for graphics are found in U.S. Pat. No. 6,811,489 to Shimizu, et al.
A major problem encountered by state-of-the-art manipulated items such as wands and gaming implements is that they do not possess a sufficiently robust and rapid absolute pose recovery system. In fact, many do not even provide for absolute pose determination. Rather, they function much like quasi-3D mice. These solutions use motion detection components that rely on optical flow sensors, inertial sensing devices or other relative motion capture systems to derive the signals for interfacing with cyberspace. In particular, many of such interface devices try to solve just a subset of the motion changes, e.g., inclination. An example of an inclination calculation apparatus is found in U.S. Pat. No. 7,379,841 to Ohta while a broader attempt at determining relative motion is taught in U.S. Pat. No. 7,424,388 to Sato and U.S. Application 2007/0049374 to Ikeda, et al.
Unfortunately, motion mapping between space and cyberspace is not possible without the ability to digitize the absolute pose of the item in a well-defined and stable reference frame. All prior art approaches that do not solve the full motion problem, i.e., all devices and methods that do not capture successive absolute poses of the item with a method that accounts for all six degrees of freedom (namely, three translational and the three rotational degrees of freedom inherently available to rigid bodies in three-dimensional space) encounter limitations. Among many others, these limitations include information loss, appearance of an offset, position aliasing, gradual drift and accumulating position and orientation error.
In general, the prior art has recognized the need for tracking all six degrees of freedom of items moving in three-dimensions. Thus, optical navigation solutions typically employ several stationary cameras to determine the position or trajectory of an object in an environment by studying images of the object in that environment. Such optical capturing or tracking systems are commonly referred to as optical motion capture (MC) systems.
This approach to motion capture tends to be computationally expensive because of significant image pre- and post-processing requirements, as well as additional computation associated with segmentation and implementation of algorithms. One particular system taught by McSheery et al. in U.S. Pat. No. 6,324,296 discloses a distributed-processing motion capture system that employs a number of light point devices as markers, e.g., infrared LEDs, attached to the item or object whose motion is to be determined. The markers use unique sequences of light pulses to represent their unique identities and thus enable filtering out of information not belonging to the markers (i.e., background noise) by the imaging cameras located in the environment. Since McSheery's system permits a great deal of irrelevant information from the imaging sensors (e.g., CCDs) to be discarded before image processing, the system is less computationally expensive than more traditional motion capture systems.
Another three-dimensional position and orientation sensing system that employs markers on the item is taught by Kosaka et al. in U.S. Pat. No. 6,724,930. In this case the markers are uniquely identified based on color or a geometric characteristic of the markers in the extracted regions. The system uses an image acquisition unit or camera positioned in the environment and relies on image processing functions to remove texture and noise. Segmentation algorithms are used to extract markers from images and to determine the three-dimensional position and orientation of the item with respect to the image acquisition apparatus.
Still another way of employing markers in position and orientation detection is taught in U.S. Pat. No. 6,587,809 by Majoe. The item or object is tracked by providing it with markers that are activated one at a time and sensed by a number of individual sensors positioned in the environment. The position of the energized or active marker is determined by a control unit based on energy levels received by the individual sensors from that marker.
The above approaches using markers on objects and cameras in the environment to recover object position, orientation or trajectory are still too resource-intensive for low-cost and low-bandwidth interfaces and applications. This is due to the large bandwidth needed to transmit image data captured by cameras, the computational cost to the host computer associated with processing image data, and the data network complexity due to the spatially complicated distribution of equipment (i.e., placement and coordination of several cameras in the environment with the central processing unit and overall system synchronization).
Despite the above-mentioned limitations of general motion tracking systems, some aspects of these systems have been adapted in the field of manipulated items used for interfacing with computers. Such objects are moved by users in three-dimensional environments to produce input for computer applications. Hence, they need to be tracked in all six degrees of freedom. Therefore, recent three-dimensional wands and controls do teach solving for all six degrees of freedom.
For example, U.S. Patent Application 2008/0167818 to Kimber et al. has a passive wand with no on-board devices or LEDs. The wand is viewed from multiple cameras. Finding the full 6 degrees of freedom to provide for more precise estimation of wand pose is expressly taught in this reference. Similarly, U.S. Pat. No. 6,982,697 to Wilson et al. teaches the use of external calibrated cameras to decode the orientation of the pointer used for control actions. U.S. Patent Application 2006/0109245 to Wilson, et al. further teaches how intelligent computing environments can take advantage of a device that provides orientation data in relative motion mode and absolute mode. Further teachings on systems that use external or not-on-board cameras to determine the pose and motion of a wand or control and use it as input into various types of applications can be found in U.S. Patent Applications: 2008/0192007, 2008/0192070, 2008/0204411, 2009/0164952 all by Wilson.
Still other notable teachings show as few as a single off-board camera for detecting three-dimensional motion of a controller employed for game control purposes. Such cameras may be depth sensing. Examples of corresponding teachings are found in U.S. Patent Application 2008/0096654 by Mondesir, et al., as well as U.S. Patent Applications 2008/0100825, 2009/0122146 both by Zalewski, et al.
Unfortunately, approaches in which multiple cameras are set up at different locations in the three-dimensional environment to enable stereo vision defy low-cost implementation. These solutions also require extensive calibration and synchronization of the cameras. Meanwhile, the use of expensive single cameras with depth sensing does not provide for robust systems. The resolution of such systems tends to be lower than desired, especially when the user is executing rapid and intricate movements with the item in a confined or close-range environment.
Another approach involves determining the position or attitude of a three-dimensional item in the absolute sense and using this position or attitude data for driving a graphical user interface. One example of this approach is taught in U.S. Pat. No. 6,727,885 to Ishino, et al. Here the sensor is on-board the manipulated object. A projected image viewed by the sensor and generated by a separate mechanism, i.e., a projection apparatus that imbues the projected image with characteristic image points, is employed to perform the computation. Additional information about such apparatus and its application for games is found in U.S. Pat. Nos. 6,852,032; 6,993,206 both to Ishino, et al.
The solution proposed by Ishino et al. is more versatile than the prior art solutions relying on hard-to-calibrate and synchronize multi-camera off-board systems or expensive cameras with depth sensing capabilities. Unfortunately, the complexity of additional hardware for projecting images with characteristic image points is nontrivial. The same is true of consequent calibration and interaction problems, including knowledge of the exact location of the image in three-dimensional space. This problem translates directly to the difficulty of establishing stable frames in the three-dimensional environment and parameterizing them. Furthermore, the solution is not applicable to close-range and/or confined environments, and especially environments with typical obstructions that interfere with line-of-sight conditions.
There are still other teachings attempting to improve on both the apparatus and method aspects of generating computer input with manipulated items or objects such as wands, pointers, remote controls (e.g., TV controls). A very illuminating overall review of state of the art technologies that can be used for interacting with virtual environments and their limitations are discussed by Richard Halloway in “Virtual Environments: A Survey of the Technology”, University of North Carolina at Chapel Hill, September 1993 (TR93-033). Still more recent teachings focusing on how absolute pose data can be used in specific contexts and for remote control applications is discussed in the following U.S. Patent Applications: 2007/0189737; 2008/0106517; 2008/0121782; 2008/0272272; 2008/0309511; 2009/0066647; 2009/0066648; 2009/0153389; 2009/0153475; 2009/0153478; 2009/0158203 and 2009/0158222.
The challenges for 3D user interfaces with the digital world do not end with their ability to recover absolute pose in an efficient and accurate manner. Many additional issues need to be addressed and resolved, over and above those that we have discussed above. In fact, it may be in a large part due to the fact that some of the more basic challenges are still being investigated, that the questions about how to use the recovered poses are still unanswered.
In particular, the prior art does not address the mapping between absolute poses recovered in a stable reference frame and the digital world to obtain a meaningful interface and user experience. Even the parent U.S. patent application Ser. No. 10/769,484 (Published Appl. 2005-0168437, now allowed), although it teaches the use of various subsets of absolute pose data as well as processing data in those subsets, does not teach or suggest to a person skilled in the art, how to map absolute pose data from the real three-dimensional environment into the digital world of a software application.