The invention relates generally to methods and systems for tracking attitude of a device and more particularly to tracking the attitude of a device in order to control a device or process, such as a cursor of a video display.
There are applications in which video systems require that a person interact with information presented on a display screen. At times, the interaction is to occur while the person is situated at a distance from the display screen. As will be described more fully below, the interaction may be accomplished by remotely controlling a screen cursor in one of a variety of manners. The interactions may include selecting from a variety of choices presented as a screen menu, or xe2x80x9ctypingxe2x80x9d text using an on-screen keyboard. Examples of remote interactive video systems (RIVS) include interactive television (ITV), TV-style Internet browsers, and conference-room video projectors.
One key component of a RIVS is the xe2x80x9cpointingxe2x80x9d device for controlling the on-screen cursor. The pointing device fulfills a function analogous to that which mice, trackballs, and graphic tablets perform for computers. However, the environment for RIVS presents difficulties that are typically not encountered in operation of a computer. For example, an operator of a RIVS is typically further away from the controlled device than is the operator of a computer. As another example, the operator of a RIVS is more likely to be in an unstructured immediate environment, e.g., an ITV operator seated across a living room from a television set. In many situations, the environment precludes use of conventional computer pointing devices, such as mice. Moreover, a RIVS is rarely equipped with a keyboard, so that the pointing device may have to accommodate the extra burden of providing a text entry.
There are a number of known pointing devices for a RIVS. Most of the known pointing devices implement some variation of a four-key cursor pad on a hand-held controller. The four-key cursor pad is manipulated to step the screen cursor up, down, left or right among various menu choices. Such interfaces emulate the computer keyboard cursor keys used with old-style textural interfaces. However, these interfaces are typically much slower and less intuitive to use than computer mice and other pointing devices developed for modern graphical software interfaces.
In an effort to improve upon cursor control within the RIVS environment, more advanced computer pointing devices of mice and trackballs have been adapted. In one adaptation, a miniature trackball is mounted atop a controller, with the trackball being operated by the person""s thumb. The trackball controller is faster than the use of cursor keys and facilitates diagonal moves. Unfortunately, the trackball may require repeated strokes to accomplish large cursor movements and, in general, thumb control taxes the user""s thumb dexterity. For example, it is difficult to trace the cursor in a circle on the display screen.
The use of a mouse for ITV cursor control has been demonstrated. The advantage of the mouse is that it provides excellent and intuitive cursor control. The concern is that there may not be a suitable planar operating surface that is convenient to the operator.
A further refinement in the RIVS pointing art is the use of devices that enable control of a cursor by merely gesturing with a controller. These devices may measure the attitude, i.e. pitch, yaw, and possibly roll, of the controller. A first category of such an approach employs light beams to measure attitude. PCT International Publication Number WO 95/19031 describes a system for determining the pointing orientation of a remote unit relative to a fixed base unit. The fixed base unit includes one or more light sources for emitting a light beam. The emitted light is polarized in at least one predetermined orientation. The movable remote unit includes a photodetector for detecting the polarized emitted light. The attitude of the movable remote unit may be determined by measuring the intensity of received light from various directions.
Another implementation of the emitted-light category of measuring attitude is one in which an infrared (IR) signal is beamed from the area of the video display. The IR signal is defocused and is imaged onto a quad photodiode array in the controller. The relative signal amplitudes from the four photodiodes may be used to determine the relative orientation of the controller to a line drawn from the display. One concern is that the system may undesirably flood the room with intense IR, rendering other nearby IR-coupled appliances (e.g., a VCR controller) inoperative. A second concern is that the limited range of transmission of defocused IR signals may render this system of measuring attitude unreliable when the controller is more than a relatively short distance from the video display.
A second category of devices that measure attitude of the controller is one in which inertial navigation principles are employed. Gyroscopes or encoded gimballed masses establish inertial frames in the controllers, against which attitude changes can be measured. The attitude information may then be transmitted to the video display via a radio-frequency link to a small dipole antenna affixed atop the video display.
The third category is related to the first category. A hand-held object that provides cursor control has a number of light sources mounted on one surface. A single electronic camera is directed to capture images of the light sources mounted on a hand-held object. Locations of the images of the light sources are detected in each camera image, and a computer is used to determine the attitude of the light-emitting hand-held object. Such a device is described in U.S. Pat. No. 5,338,059 to DeMenphon.
A closely related need exists in the field of virtual reality. In games, simulations, and other visualization situations, it is often necessary to encode the attitude of a user""s head, or other body part. In many cases, systems for encoding head pitch and yaw may be applied to RIVS controllers, and vice versa. One known virtual reality system encodes pitch and yaw by means of instrumented compasses and gravimeters.
While the known cursor control devices and attitude-determining systems operate adequately for their intended purposes, each is associated with a concern or a problem. Operation may be slow or tedious, or may require use of a specific operating surface. Devices and systems that include IR radiation may adversely affect operation of other devices. Attitude-sensing devices that are based on gravity may have difficulty in distinguishing tilting from transverse acceleration, thereby rendering control erratic. This last problem conceivably could be solved by gyro stabilization, but the cost and power consumption make this solution unattractive. Known systems that utilize light detection require adding a second contrivance at the display, again adding additional cost.
What is needed is a method and a system for reliably tracking attitude of a device. What is further needed is such a method and system that is cost efficient when used in controlling a screen cursor or when used in other remote interactive video applications.
Correlation of successive images acquired by means of a two-dimensional array of photosensors is used as a basis for tracking attitude of a device to which the array is affixed. In the preferred embodiment, the device is a hand-holdable member, such as a controller for maneuvering a cursor on a display screen of a video set. Based upon the step of correlating images to detect differences in location of imaged features that are common to a succession of images, the system generates an attitudinal signal indicative of any changes in angular orientation during the time period of acquiring the images. That is, the attitudinal signal is determined by the pitch and yaw, and optionally the roll, of the device that bears the array of photosensors. Since the acquired images need not be related to that which is being controlled, e.g. a screen cursor, the device can face in any direction during the control process. Moreover, it is not necessary to provide a dimensional one-to-one correspondence of angular displacement of the device and travel of that which is being controlled. Within cursor control, for example, the controller may be directed arbitrarily and relationships of degrees of pitch and yaw to lengths of cursor movement may be user-adjustable.
The two-dimensional array of photosensors is used to acquire a reference frame for tracking the attitude of the device. The reference frame is stored and a second image of features within a field of view of the array is acquired. The second image may be considered to be a sample image, and the fields of view of the two images should be largely overlapping, so that the reference and sample frames include a number of common features. While not critical, the device includes optics which provide a focus nominally at infinity, intentionally presenting an off-sharp image to the array of photosensors. In the application of the device for controlling a screen cursor, the representative imaged features will typically include windows, lamps, furniture and the display screen itself. In any application of the invention, one or more stationary sources of light may be specifically added within the environment to be imaged, so that successive images of the fixed light are used for the purpose of correlation. In one implementation of such an embodiment, the source of light is an IR emitter and the imaging array on the device is provided with IR filtering to permit tracking of the attitude of the device.
Conceptually, the step of correlating the reference frame with a sample frame is one in which one of the frames is fixed in position and the other frame is repeatedly shifted to determine which shifted position best approximates an alignment of the imaged features that are common to the two frames, thereby allowing the determination of the pitch and yaw of the imaging array during the interval between acquiring the two frames. In practice, the shifts are performed computationally and are shifts of pixel values in which each pixel value is indicative of light energy received at a particular photosensor at a specific time. The correlations may be limited to computational shifts of only one pixel for nearest-neighbor correlations, or may be multi-pixel computational shifts. The nearest-neighbor correlation process is often preferred, since it is less computationally complex, with only the original position and eight computational shifts being necessary. Interpolations are then performed to determine angular displacements that are less than a full pixel. Angular displacement of the device about a horizontal axis, i.e. pitch, will result in the arrangement of pixel values of the reference frame being moved upwardly or downwardly. Angular displacement of the device about a vertical axis, i.e. yaw, will result in the pixel value arrangement being moved to the left or to the right. The system detects pitch, yaw and combinations of pitch and yaw. The attitudinal signal that is generated by the system is responsive to the detection of such angular displacements. Optionally, roll may also be considered.
In the application in which the attitudinal signal is generated in order to control a screen cursor, the device preferably includes a transmitter for wireless transmission of a cursor-control signal. For example, the signal may be transmitted via an infrared beam. Changes in the pitch of the hand-holdable device are then translated into vertical movements of the screen cursor, while changes in device yaw will move the screen cursor laterally. In this embodiment, translational movement of the device may also be detected and utilized, so that vertical or horizontal movement of the device translates to a corresponding vertical or horizontal movement of the screen cursor.
One concern in the implementation of the method and system is the effect of the phenomena known in the lens design as curvilinear distortions. Curvilinear distortions are also referred to as pin-cushion, barrel, and perspective distortions. Rectilinear detail is compressed at the outer edges of the field by such distortion. Curvilinear distortion is particularly pronounced in simple lenses with wide fields of view, such as the lens contemplated for use with the present invention. In the invention, the field of view is preferably approximately 64xc2x0, so that curvilinear distortions will inevitably occur.
In the preferred embodiment, the photosensors of the array vary dimensionally in order to define an array that is curvilinear, i.e., includes an arcuate outer edge. The curvilinear array is dimensioned to compensate for the curvilinear distortion introduced by the lens system. The imaging by the optics is evaluated to characterize the curvilinear distortion, with the array then being patterned to offset the distortion. In this manner, the arrangement of the photosensor array and the optics greatly reduces adverse effects of curvilinear distortion.
An advantage of the invention is that device attitude may be tracked in a reliable and cost-efficient manner. For those applications in which the array-bearing device is a hand-holdable device, control of a screen cursor or the like is economically achieved without a premium on dexterity. Moreover, the device does not require operation on a suitable surface.