BACKGROUND OF THE INVENTION
1. Technical Field
This invention is directed towards a system and method for transforming a computer monitor screen into a touch screen using an ordinary camera.
2. Background Art
Input devices for use in computer environments are known in the art. They are used to input data into a computer based system. Such data may be used to navigate a cursor on a display, to control the functions of a certain device or to simply input information to a system.
An input device may comprise a touch screen. A xe2x80x9ctouchxe2x80x9d on a typical touch screen means that the touch screen senses the presence of an object such as a tip of a finger or another object, for example a stylus, at and/or at a small distance from an active surface area of the touch screen. An output signal which, in general, is either an electrical or an optical signal is generated from the touch screen. The output signal may include information which is directly dependent on the position of the xe2x80x9ctouchxe2x80x9d on the touch screen. In this case the output signal may include information of the x and y coordinates of the xe2x80x9ctouchxe2x80x9d on the touch screen. Alternatively, the active surface area may be arranged into predetermined regions and, when a particular region is xe2x80x9ctouched xe2x80x9d, the output signal may then depend on a unique identification code which refers to that particular region. Touch screens are more convenient than conventional computer screens because the user can directly point to an item of interest on the screen instead of having to use a mouse or other pointer. Use of a mouse or pointer requires learning hand to eye coordination to effectively move the cursor on the screen. Touch screens are particularly useful for children""s software programs because it takes children a long time to master the use of a mouse. Conventional touch screens are, however, expensive and difficult to manufacture, making them impractical for many applications.
The present invention overcomes the aforementioned limitations in prior touch screens by a system and method that turns a regular computer monitor screen into a touch screen using an ordinary camera. This system and method includes an image-screen mapping procedure to correct for the non-flatness of the computer screen. It also includes a segmentation method to distinguish the foreground, for example an indicator such as a finger, from the background of a computer screen. Furthermore, it also includes a robust method of finding the tip point location of the indicator (such as the finger tip).
The system setup is very simple as it essentially involves only positioning a camera so as to view the screen of a computer monitor. Ideally, the camera views the screen from a point along a line normal to the center of the screen. However, as this will likely interfere with the user who typically sits in front of the computer monitor, the camera can be shifted away from the normal line to get it out of the way of the user. The camera cannot be moved too far away from the normal line, however, or errors will be introduced in the process which is to be described shortly.
There are four major functional parts to the system and method according to the present invention. These are calibration, extraction of a background model, extraction of a foreground model and a main processing block. The main functional block is the kernel of the system. Its function is to locate the tip point of the indicator in an image of the screen and map its image coordinates to the screen coordinates. To do this the indicator is first segmented from the background. Then the tip point of the indicator is found. The segmentation process requires that color models for both the background and the indicator be calculated. During calibration the mapping between the image coordinates and the screen coordinates is established. This mapping is then used in the main functional block to find the corresponding screen coordinates for the tip point once its image coordinates are estimated. The screen coordinates of the tip point are then used to control the position of the system indicator, sometimes referred to as a cursor.
The purpose of the calibration procedure is to establish a projective mapping between the image coordinates and the screen coordinates. If the screen is flat, the plane perspectivity from the screen plane and its two dimensional (2D) projection on the image plane is described by a homography, i.e., a 3xc3x973 matrix defined to a certain scale. This homography can be used to map the image coordinates to the screen coordinates and can easily be determined from four pairs of image-screen correspondences. These correspondences are not difficult to obtain because the screen coordinates can be chosen as the four corners of the screen and their corresponding image points can either be detected automatically or can be specified by the user.
Most computer monitor screens are not flat, however. To correct for the curvature of the screen, a homography is computed as before. Since the screen is not actually flat, the computed homography is just an approximation. Then a series of dots forming a grid (referred to as calibration points hereafter) whose center coordinates are known in the screen plane are displayed on the screen. Preferably, this is done one at a time in sequence (e.g., from left to right starting with the top row of the grid). A dot on the screen is usually projected in the image plane as an ellipse and the centroid of an ellipse can easily be computed. The centroid of the ellipse can be considered to be the projection of the center of the corresponding dot. As each calibration point appears on the screen, an image of the screen is captured. The ellipse representing the dot in the image is found in the image and the coordinates of its centroid are calculated. It is noted that this can be accomplished using standard techniques for segmenting foreground pixels, including the color segmentation procedure that will be discussed later. The search of the image can be limited to a region of the image surrounding the point where the center of the displayed dot is likely to be seen based on previously derived homograph. The centroid of the ellipse representing the displayed dot in the camera image is then mapped back to the screen coordinates also using the previously computed homograph. These mapped points are called estimated calibration points. Each estimated calibration point is compared to the screen coordinates of the original calibration point. The difference between the original and the estimated calibration points defines a residual vector. Once each dot is displayed and analyzed, the result is a grid of residual vectors. Bilinear interpolation is then used to compute the residual vectors of all screen points (e.g., pixels) not on the grid. The resulting residual vector field is used to compensate for mapping errors caused by the curvature of the screen for all points on the screen. Finally, it is noted that while the foregoing procedure need not be implemented if a flat or nearly flat screen is involved, it may still prove advantageous to do so. Since the homography is computed using just four point correspondences, any inaccuracies in the point coordinates will result in an inaccurate homography. The foregoing compensation procedure corrects for any inaccuracies because many more points are compared.
The aforementioned procedures for extracting a background and foreground model preferably employ a color segmentation technique. Sometimes it is difficult to separate the indicator from the background screen. However, it has been observed during experimentation, that images of screen pixels have some degree of invariance in the color spacexe2x80x94they are dominated by blue colors. This observation forms the base of the segmentation procedure described as follows.
The color segmentation procedure first computes a color model for the screen without the indicator (e.g., finger, pen, etc.). This is done by capturing an image of the screen while it displays the colors typical of the screen images used in the program for which the present invention is being used to simulate a touch screen. The captured image is used to compute a background model for the screen. To compute this background model all of the pixels in the image are histogrammedxe2x80x94namely, for each pixel its color intensity is placed in the proper bin of a preferred possible 256 intensity levels. This is preferably done for each of the red, green and blue (RGB) channels thus generating three separate histograms. Alternately, one histogram could be generated using some joint space representation of the channels. Once the histogramming has taken place, a Gaussian distribution for each histogram is calculated to provide the mean pixel intensity of the background and the variance therefrom. This information is useful for determining which pixels are background pixels.
Once the modeling of the background of the screen has been completed, the model for the indicator or pointer is computed in order to separate the indicator from the background. This is done by asking the user to select a polygonal bouding area displayed on the screen for the indicator of choice. Only the pixels inside this polygonal area are used to compute the color model for the indicator. The computation is done in the same way the background model was produced. Usually the color model for the indicator will be dominated by a different color in color space than the background. Once a color model for the indicator has been determined, this model will not have to be recalculated unless a pointer with a significantly different color is employed.
Once both the screen background and indicator models are determined, the tip of the indicator can be located and its image coordinates can be mapped to screen coordinates. As indicated earlier, this first involves segmenting the indicator from the screen background in an image of the screen on which the user is pointing. To this end, a standard Bayes classifier (or the like) is used to segment the indicator from the screen background. A Bayes classifier generally operates by calculating, given a pixel color intensity, whether the pixel is more probably a foreground (indicator) or a background (screen) pixel. This classifier operates on the presumption that the screen background pixels are likely to have a mean pixel intensity that differs significantly from the mean pixel intensity of the indicator (such as the finger). If the extracted models of the foreground and background are split into separate RGB channels, the Bayes classifier determines the probability a given pixel color is a background pixel for each channel and these probabilities are multiplied together. The classifier also determines the probability a given pixel is a foreground pixel for each channel and multiplies the probabilities together. Next, the background pixel probability product is divided by the foreground pixel probability product. If this quotient is greater than one then the pixel is determined to be a background pixel, otherwise it is determined to be a foreground or indicator pixel.
The indicator tip location should be consistently determined. In the system and method according to the present invention, the tip point is defined as the intersection of the indicator""s centerline and its boundary along the direction that the indicator is pointing. This definition has been simplified by allowing the indicator to point only in an upwards direction. The system and method according to the present invention robustly finds the centerline of the indicator and its intersection with the upper boundary of the indicator. To elaborate, a cumulative total of the number of pixels that belong to the foreground are calculated on a scan line by scan line basis starting at the top of the image containing the indicator. The number of pixels representing foreground pixels in each scan line are next analyzed to determine the scan line where the foreground pixels first appear and increase in cumulative total thereafter (i.e., representing a step). The identified scan line roughly corresponds to where the indicator tip location may be found. Next, a number of lines above and below the identified line (e.g., xc2x115 lines) are selected and each is scanned to find the start and end of the foreground pixels (if any) in the horizontal direction. In addition, the center point of each series of foreground pixels along each of the scan lines is determined and a line is fit through these points. The pixel corresponding to the indicator tip location is then determined by scanning all pixels within the previously identified indicator window (e.g., xc2x115 lines) to find the boundary pixels. The pixel corresponding with the tip of the indicator is the boundary pixel where the previously determined centerline intersects the boundary of the indicator. Finally, a Kalman filter may be used to filter out noise in the determined finger tip location.
Once the pixel of the image corresponding to the pointer tip (and so its image coordinates) has been determined, this location is mapped to the corresponding screen coordinates. This is done using the previously determined homography to identify the rough screen coordinates associated with the pointer tip image coordinates. The rough coordinates are then refined using the residual vector applicable to the identified screen coordinates. The resulting location is deemed the place where the user is pointing to the screen. The screen coordinates of the tip point are then used to control the position of the system indicator, which is sometimes referred to as a cursor.
The system and method according to the present invention has the advantages of being fast, accurate and reliable. Additionally, it allows a touch screen to be created relatively inexpensively, especially when compared to present day touch screen implementations.