1. Technical Field
The system and method according to the present invention employs an arbitrary quadrangle-shape panel and a pointer tip like a fingertip as an intuitive input device to a remote display.
2. Background Art
The exploration of vision-based interfaces is motivated by the unnaturalness of some of the conventional input devices such as mice and joysticks. The bottleneck of such devices comes from the lack of flexibility due to the constraints from the environment, and the lack of the feeling of immersiveness in human computer interaction. Magnetic sensors and transmitters could be a remedy for such conventional interaction devices. However, they are prone to magnetic interferences and many people are reluctant to use them due to the debate that they could be a health hazard. On the other hand, combined with human body motion, a vision-based interface is of great potential because it provides a non-invasive way to achieve natural and immersive interaction between human and the computer.
There are many applications, such as in smart rooms and teleconferencing, where conventional mice and keyboards turn out to be not suitable because only one person can use them if there are several people in the room. This motivates the development of a vision-based gesture interface. One of the most intuitive and convenient ways to control an intelligent environment is to employ human body motion, especially hand/finger motion. The use of hand gestures has become an important part of Human Computer Interfaces (HCI) in recent years. In order to use the human hand as a natural interface device, some alternatives, such as glove-based devices, have been used to capture human hand motion by attaching some sensors to measure the joint angles and spatial positions of hands directly. Unfortunately, such devices are expensive and cumbersome. Vision-based techniques provide a promising alternative to this goal, since they can be very cost-efficient and non-invasive.
There have been many implemented vision-based application systems in domains such as virtual environments, human-computer interfaces, teleconferencing, sign language translation, etc. However, few of such vision-based interfaces are able to achieve accurate display control and text input. One of the reasons is that such systems are not facilitated with robust, accurate and fast hand/finger tracking using live video inputs. Two dimensional (2D) tracking has been used and has been based on several different cues, such as, for example, color, motion, and image features. Although color tracking provides a cost-efficient way for tracking, it is prone to lighting changes and it is not suitable for accurate tracking. Tracking using image features such as geometric shapes may provide accurate tracking results but may require extensive processing resources.
The system and method according to the present invention seeks to solve the aforementioned problems by using an arbitrary quadrangle-shaped plane object, such as a paper, and a pointer tip, such as a fingertip or a pen, to serve as a natural and convenient input device for accurately controlling one or more remote displays, based on computer vision techniques.
Functionally, the system consists of panel tracking, pointer tip tracking, homography calculation/updating and action detection/recognition. In the most general terms, video sequences taken by a camera which captures the movement of both the quadrangular panel and pointer are analyzed by a panel tracker and a pointer tip tracker. The panel can be anything as long as it is quadrangle-shaped and relatively rigid.
The setting of the camera can be quite flexible. The camera can be located anywhere as long as the panel is not completely occluded. For instance, it is possible to mount the camera on the ceiling. The user can rotate, translate and tilt the panel to reach a comfortable pose for use. Under some circumstances when the user wants to walk around, a camera can be mounted on top of his head by wearing a hat, or on his shoulders, such that the user can be anywhere to interact with the computer.
Since an arbitrarily quadrangle-shaped panel is used to control the cursor position on the remote computer display, one must know the mapping between a point on the panel and a point on the display. Furthermore, what is available is an image sequence of the panel which may undergo arbitrary motion (as long as the image of the panel does not degenerate into a line or a point), so the mapping between a point in the image plane and a point on the panel must also be known. It is assumed that the camera performs a perspective projection (pinhole model). As the display, the panel, and the image plane are all planes, both above relationships can be described by a plane perspectivity.
The mapping between the image of the panel and the remote display can be described by a homography matrix once the four corners of the panel are located in the image. As the dimensions of the display are known, the homography can be computed by mapping each corner of the panel to a corner of the remote display.
A panel tracker tracks an arbitrary quadrangle-shaped plane object by outputting the positions of its four corners. An edge-based dynamical programming technique is employed in the panel tracker to locate these four corners. The edge-based programming technique employs a gradient of color intensity to locate the edges, since the difference in color intensity between the surroundings and the panel should typically be significant. This technique is quite robust and reliable, even if some of the corners of the panel or part of the edges are occluded. At the same time, since the positions of the corners are calculated by intersecting four lines of the quadrangle, the positions can be calculated in sub-pixels, which allows for more accurate calculation of the homography which describes the mapping between the panel and the remote display. Through this homography, any point on the panel is mapped to the corresponding position on the remote display.
The system and method according to the present invention determines the location of a pointer tip by using a conic curve fitting technique. Since users can use their fingertip to control a cursor on a remote display, the tracking of the pointer tip should be as accurate and stable as possible. This is because a small error in the tip position will be magnified in the remote display. For instance, it is assumed that the resolution of input video is 320xc3x97240 pixels and the remote display has a reduction of 1024xc3x97768 pixels. Since generally the panel in the image is roughly half the size of the image, a tracking error of 1 pixel will incur about 6 pixels of error in the remote display, which will make the mapped cursor position very shaky. This problem of the magnified tip position error is solved by fitting an ellipse to the edge pixels representing the outline of the pointer as observed in the image. The use of an ellipse to find the pointer tip allows the tip position to be calculated in sub-pixels. This minimizes any error in the tip position once projected in the remote display.
The system allows arbitrary pointer tips, such as fingertips and pens, to be used as long as their color is distinguishable from the panel""s color. Basically, the edge points of the pointer tip are held to be those areas where the color of the image of the panel changes significantly. Once the edge points are found, an elliptical curve is fit to the edge points via a conventional curve-fitting technique. The tip of the pointer is then found by finding where the major axis of the ellipse intersects the aforementioned ellipse.
To reduce the processing necessary to locate the pointer tip, two methods can be used. The first is a Kalman filtering technique and the second is a background subtraction technique. The Kalman filtering technique can be employed to predict the tip position {overscore (p)}(t+1) at time t=1, assuming the position of the tip at time t is p(t). In a small window, for example 30xc3x9730 pixels, as many edge points as possible are identified that probably belong to the edge of the tip. This is accomplished by thresholding the gradient and taking advantage of the color of the previous edge of the tracked tip. After that, an ellipse can be fit to these edge points as discussed above to find the exact location of the tip {overscore (p)}(t+1) for time t+1. Alternately, finding the location of the pointer tip can be expedited by a re-initialization technique that employs a background subtraction technique. The background consists of the panel and the rest of the image and the foreground consists of the pointer. To achieve this result a previous image is subtracted from a current image to localize the pointer. This technique localizes the moving part of the pointer to allow the system to predict where the pointer tip location is. The previously described homography is used to predict the tip location and search for pointer tip edge points in that neighborhood (i.e., 30xc3x9730 pixels around the predicted tip point location in a tested embodiment).
The current system simulates the clicking/pressing gestures typical of using a mouse by holding the pointer tip on a position on the panel for a prescribed period of time. A message generator in the system gets inputs from an action detector, and issues various mouse and keyboard events according to the different user input methods. Building on these techniques, the system is capable of performing two types of input: virtual mouse and virtual keyboard. The position of the pointer tip can be mapped to the remote display such that a cursor can be simulated. A paper with a keyboard pattern printed on it can also be used as a virtual keyboard, by which users can point the keys on the paper to input texts.
The present invention supports two xe2x80x9cmouse buttonxe2x80x9d pressing modes (clicking mode and dragging modes) and two xe2x80x9cmouse motionxe2x80x9d types (absolute type and relative type).
As for the two mouse button pressing modes: mode I (clicking mode) simulates the left button down then up automatically and mode II (dragging mode) simulates the left button down until released. In one embodiment, clicking/pressing is simulated by holding the pointer tip in position for a period of time, say 1 second. A state variable S maintains two states: UP and DN (down), to simulate the two natural states of a button.
The variable S is initialized to be UP. In the clicking mode (mode I), when the system detects that the pointer tip has been at a fixed place for, say, 1 second (or other pre-specified duration), the state variable S is set to DN. After 0.1 second, the state variable S will be automatically set to UP to simulate button release. Appropriate mouse events are generated, and a clicking action is performed.
The clicking mode (mode I) has very limited ability of dragging, since the release is automatic. To simulate dragging, mode II uses another state variable, D, to memorize the flip of clicking. When the system detects that the pointer tip has been at a fixed place for, say, 1 second (or other pre-specified duration), variable D changes its state (from D UP to D DN or from D DN to D UP). When the D-state change from D UP to D DN is detected, a pressing action is detected; when the D-state change from D DN to D UP is detected, a releasing action is detected. Thus, an object can be selected and dragged to a different place.
The system can also simulate two mouse motion types: absolute and relative. In the absolute type, the panel will be mapped to the whole remote display, such that each point in the panel will be mapped to the corresponding point in the display. As previously discussed, this type needs very accurate tracking, since a small tracking error of the panel and pointer tip will be magnified. However, the absolute type is more intuitive.
An alternative type based on relative motion is also provided, which is much less sensitive to the tracking accuracy, since the cursor is controlled by the relative motion of the pointer tip. Assume the motion direction of pointer tip is dp(t) at time t. The moving direction of the cursor will be
dd(t)=H(t)dp(t).
The speed of cursor motion is determined by the velocity of the pointer tip, i.e.,
xcex94d=xcex1∥vp|,
where xcex1 controls the scale of the cursor speed on the display. The relative type incurs much smooth movement of the cursor with small xcex1, due to the non-magnification of tracking error. There could be many other alternatives of relative motion. For instance, the panel can be just mapped to a window area centered at previous cursor position on the remote display. In this method, the center of the panel corresponds to the previous cursor position. When the pointer tip moves from center to left, the cursor will move left. Obviously, the window area could be smaller than the panel in the image, such that the tracking error can be even minimized.