1. Field of the Invention
The invention relates to systems and methods for inputting information about the position of a target and more particularly to such systems that employ a single camera image.
2. Background
Many computer-based systems require information about the location of a target. For example, the familiar mouse is used to select a control or a coordinate on a screen. Another area in which target location data are input to a computer is in the field of automated video-conferencing systems. For example, a user may aim a camera at an object of interest by simply indicating the object, or by controlling it with a joystick. Work is proceeding on many fronts on systems that allow users to indicate targets without the use of a mouse or a joystick, but by using the familiar gestures normally used to indicate targets to other people.
Such gesture-based systems are more intuitive and easier to control than conventional systems that require explicit commands such as voice-command (xe2x80x9ccommand-control,xe2x80x9d basically a speech-based symbol processor where each verbal command corresponds to an instruction, for example xe2x80x9cPANxe2x80x94LEFT,xe2x80x9d xe2x80x9cUP,xe2x80x9d xe2x80x9cDOWN""xe2x80x9d etc.), joystick control.
There are methods of determining a direction, in which a user is pointing, using multiple camera views. For example, a camera-based system is described in detail in the article xe2x80x9cxe2x80x98Finger-Pointerxe2x80x99: Pointing interface by Image Processingxe2x80x9d by Masaaki Fukumoto, Yasuhito Suenga and Kenji Mase. Such systems are often complex because the multiple angle views may need to be combined to generate a three-dimensional model of the actual scene in order to determine the three-dimensional vector that coincides with the user""s indication. Also, the cameras need to be positioned and aimed and their positions and orientations precisely defined. The three-dimensional model is then used to determine the target to which the user is pointing. One technique for overcoming this complexity, in the limited context where the target is located in a known surface, is to use two uncalibrated cameras and planar projection transforms as described in another patent application for APPARATUS AND METHOD FOR INDICATION A TARGET BY IMAGE PROCESSING WITHOUT THREE-DIMENSIONAL MODELING, U.S. Ser. No. 09/572,991, filed May 17, 2000, the entirety of which is hereby incorporated by reference as if fully set forth herein. Here, even though calibration is not required, the method of this application requires multiple cameras which must be positioned at a substantial separation distance.
The mouse indicates the position of a desired two-dimensional coordinate on a screen by indicating relative positions. When a mouse is initially controlled, the starting position of the location indicated by it is arbitrary. Only by using feedback and relative movements can a user ultimately indicate a target position. A simple single-camera gesture-based technique, which works much like a mouse, is described in U.S. Pat. No. 5,594,469. In this method, the user""s gestures are acquired by a single camera and a position indicated by feedback. The user then modifies the gesture until the fedback signal indicates the desired result. For example, the user moves his/her hand and the direction and magnitude of displacement are mapped to relative direction and magnitude displacements of a cursor on a screen. This system, however, suffers from the same drawback as a mouse or joystick in that the starting position is arbitrary and (usually visual) feedback is required.
Briefly, the position of a target lying on a plane is indicated by inputting the projection of a pointing direction onto the plane. If the target is known to lie on a contour in the plane, the position is specified unambiguously by the direction projection, the intersection of the contour and the projection being the desired target. Alternatively, the two-dimensional position of a target can be specified by inputting its axial coordinates in successive steps. In another alternative approach, the image containing the target is translated and/or rotated and the target indicated again. The intersection of the two direction projections is then used to determine the position of the target in 2-space. The direction indications may be input by a camera or other method, such as one or more radio transmitters, casting of a shadow, etc.
In the present system, the strategy is to use planar projection transforms in the manner disclosed in U.S. patent application Ser. No. 09/572,991, reference above, but instead of using two cameras to provide independent planar projections of a single direction-indicating vector onto a common plane, coordinates of a single camera""s image are mapped to a known plane providing only one dimension of coordinate information rather than two. This single dimension, however, can be used in multiple ways. For example, a single-axis control such as a slider control could be controlled with pointing gestures. A point on a road shown on a road map may also be indicated. Also, by using successive gesture inputs, say one for the row and one for the column of a table, a desired cell can be indicated. Alternatively, an image of a scene can be projected onto a screen and a target indicated on the scene. Then, after the first indication, the scene may be translated and/or rotated and the target pointed out again. From the two planar projections of these two pointing indications, the target""s location may be deduced by simply finding the intersection of the two projections.
The invention will be described in connection with certain preferred embodiments, with reference to the following illustrative figures, so that it may be more fully understood. With reference to the figures, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.