1. Field of the Invention
The present invention relates to handwriting and gesture recognition techniques, and, more particularly, to handwriting and gesture recognition techniques for in-vehicle applications.
2. Description of the Related Art
Information entry in a motor vehicle by the vehicle operator, such as address entry to navigation devices, has always been a challenging and often dangerous task. Traditionally, vehicle operators issue commands or enter information by operating various controls in the car, such as physical buttons for radios, or displayed buttons on the touch screen of a navigation system. Such operations typically require that the operator divert his eyes from the road in order to locate the desired buttons, and sometimes the operator overstretches his arms in trying to reach the desired buttons. This is especially distracting, time-consuming, and dangerous when the buttons are small, such as on a touch screen, and the task is complicated, such as when entering an address into a navigation device. Moreover, a touch screen has to be installed within close proximity to (i.e., within the reach of) the driver, thus limiting the design options for in-vehicle dashboard layout.
Instead of using touch screens, several auto manufacturers use a combination of remote control and graphical display (for example, BMW iDrive system, and Lexus Remote Touch Interface system), so that the graphical display could be placed farther away from the operator. Still, these systems require that the operator operates the remote controls, and looks at the visual feedback on the graphical display for information and command entry. Address entry on the iDrive system, for example, requires that the operator operates the remote control to select from a list of letters, states, and/or city names from the graphical display. This is still a lengthy and dangerous process as the operator needs to move his eyes off the road for a significant period of time.
Gesture recognition and handwriting recognition has advantages over other input methods, such as a keyboard, mouse, speech recognition or a touch screen. A keyboard is a very open-ended input device and requires that the user have some basic typing proficiency. A keyboard and a mouse both contain moving parts. Thus prolonged use leads to decreased performance as the device wears out. The keyboard, mouse, and touch screen all require direct physical contact between the user and the input device, which may result in degradation of the system performance as these contact interfaces are exposed to the environment. They also require hand/finger and eye coordination, which is rather prohibitive during driving. Furthermore, any tactile interface which is exposed to the public may be abused or damaged by vandalism.
Tactile interfaces may also have hygiene problems as a result of the system becoming unsanitary or unattractive to users, and performance of the interfaces may decline. These effects may greatly diminish the usefulness of systems that accommodate a large number of users, such as advertising kiosks open to the general public. This cleanliness issue may be particularly important for the touch screen, where the input device and the display screen are part of the same device. Thus, as the input device becomes dirty, the effectiveness of the input and display is reduced. The performance of speech recognition may be very poor in potentially noisy environments, such as passenger compartments of vehicles, whether the windows are rolled down or not. Also, speech recognition may not be appropriate where silence is needed, such as in a military mission or in a library.
Gesture and handwriting recognition systems avoid the problems listed above. There are no moving parts in gesture recognition systems, so devices do not wear out. Cameras including infrared ones that are used to detect features for gesture recognition can be easily built to withstand harsh environments, and can be made very small so that they can be used in a wider variety of locations. In a gesture recognition system there is no physical contact between the user and the device, and so there is no hygiene problem. The gesture recognition system does not require any sound to be made or detected, so background noise does not cause a problem. A gesture recognition system is capable of controlling a number of devices in response to interpreting a set of intuitive gestures and handwritings. The gestures recognized by the system may be selected as being those that seem natural to users, which decreases the required learning time period. The gesture recognition system can also provide users with symbol pictures of useful gestures that are similar to those normally used in American Sign Language books. Simple tests can then be used to decide what gestures are most intuitive for any given application.
For certain types of devices, the use of gesture inputs is more practical and intuitive. For example, when controlling a compact disc player within a vehicle, basic commands such as “next track”, “previous track”, “increase volume”, “decrease volume”, etc., may be most efficiently communicated in the form of gestures. Certain other environments also derive practical benefits from using gestures. For example, keyboards may be awkward to carry on some military missions, and may create noise on missions where silence is essential to success. In such scenarios, gestures may be the most effective and safest form of input.
Most gesture recognition systems are developed for indoor environments. Some impressive applications include: controlling an avatar in a virtual environment application or being one of the modalities in home device control systems. In addition, some gesture recognition systems may have similar algorithms, but may require some special or intrusive devices. However, such special or intrusive devices may be particularly undesirable in a typical in-car environment wherein space is at a premium, and any visual clutter needs to be avoided.
The first step in a hand gesture recognition method is a hand detection module. The function of this module is to detect and locate the hand region in every frame of the video sequence. In order to detect the hand region, several different techniques can be applied. Such techniques can be classified into two main categories: motion-based and background subtraction-based. The motion-based technique assumes that the background moves slower than the hand region. This motion-based technique is quite efficient when the background is almost steady, but may need more computing resources when the background undergoes a greater rate of change. In order to cope with changing background scenarios, typical techniques such as running average, Gaussian mixture model, kernel density estimation, mean shift algorithm, etc., can be used for both hand detection techniques.
The background subtraction-based method is intuitive, and several background/foreground modeling techniques can also be applied to strengthen the reliability of the hand detection module. The basic technique for the background subtraction-based method is to examine the difference between the current frame and the background frame on a pixel-by-pixel basis. If the difference between a pixel's value and a pixel value in the background frame is larger than a predefined threshold in any predefined color space such as grayscale, HSV, normalized RGB, YCbCr, etc, then the corresponding pixel is determined to be a possible candidate for the hand region. Instead of using the threshold, the background subtracted frames can also be passed through any skin color model such as Gaussian mixture model or histogram based techniques in order to locate the hand region.
The known background subtraction techniques typically deal with either a static background or fast moving background. However, the background in in-vehicle settings typically falls in-between these two extreme cases. For example, the background of the video stream in an in-car environment usually moves slowly because of small vibrations within the car. Furthermore, the illumination level does not usually change drastically in a typical in-car environment.
What is neither disclosed nor suggested in the art is a driver input system that overcomes the problems and limitations described above. More particularly, what is neither disclosed nor suggested is a driver input system that provides improved performance in reading a user's spatial hand gestures. Such spatial hand gestures may include the “drawing” of alphanumeric characters with a user's finger on or near a surface within the vehicle.