The invention relates generally to inputting commands and/or data (collectively, referred to herein as xe2x80x9cdataxe2x80x9d) to electronic systems including computer systems. More specifically, the invention relates to methods and apparatuses for inputting data when the form factor of the computing device precludes using normally sized input devices such as a keyboard, or when the distance between the computing device and the input device makes it inconvenient to use a conventional input device coupled by cable to the computing device.
Computer systems that receive and process input data are well known in the art. Typically such systems include a central processing unit (CPU), persistent read only memory (ROM), random access memory (RAM), at least one bus interconnecting the CPU, the memory, at least one input port to which a device is coupled input data and commands, and typically an output port to which a monitor is coupled to display results. Traditional techniques for inputting data have included use of a keyboard, mouse, joystick, remote control device, electronic pen, touch panel or pad or display screen, switches and knobs, and more recently handwriting recognition, and voice recognition.
Computer systems and computer-type systems have recently found their way into a new generation of electronic devices including interactive TV, set-top boxes, electronic cash registers, synthetic music generators, handheld portable devices including so-called personal digital assistants (PDA), and wireless telephones. Conventional input methods and devices are not always appropriate or convenient when used with such systems.
For example, some portable computer systems have shrunk to the point where the entire system can fit in a user""s hand or pocket. To combat the difficulty in viewing a tiny display, it is possible to use a commercially available virtual display accessory that clips onto an eyeglass frame worn by the user of the system. The user looks into the accessory, which may be a 1xe2x80x3 VGA display, and sees what appears to be a large display measuring perhaps 15xe2x80x3 diagonally.
Studies have shown that use of a keyboard and/or mouse-like input device is perhaps the most efficient technique for entering or editing data in a companion computer or computer-like system. Unfortunately it has been more difficult to combat the problems associated with a smaller size input device, as smaller sized input devices can substantially slow the rate with which data can be entered. For example, some PDA systems have a keyboard that measures about 3xe2x80x3xc3x977xe2x80x2. Although data and commands may be entered into the PDA via the keyboard, the entry speed is reduced and the discomfort level is increased, relative to having used a full sized keyboard measuring perhaps 6xe2x80x3xc3x9712xe2x80x2. Other PDA systems simply eliminate the keyboard and provide a touch screen upon which the user writes alphanumeric characters with a stylus. Handwriting recognition software within the PDA then attempts to interpret and recognize alphanumeric characters drawn by the user with a stylus on a touch sensitive screen. Some PDAs can display an image of a keyboard on a touch sensitive screen and permit users to enter data by touching the images of various keys with a stylus. In other systems, the distance between the user and the computer system may preclude a convenient use of wire-coupled input devices, for example the distance between a user and a set-top box in a living room environment precludes use of a wire-coupled mouse to navigate.
Another method of data and command input to electronic devices is recognizing visual images of user actions and gestures that are then interpreted and converted to commands for an accompanying computer system. One such approach was described in U.S. Pat. No. 5,767,842 to Korth (1998) entitled xe2x80x9cMethod and Device for Optical Input of Commands or Dataxe2x80x9d. Korth proposed having a computer system user type on an imaginary or virtual keyboard, for example a keyboard-sized piece of paper bearing a template or a printed outline of keyboard keys. The template is used to guide the user""s fingers in typing on the virtual keyboard keys. A conventional TV (two-dimensional) video camera focused upon the virtual keyboard was stated to somehow permit recognition of what virtual key (e.g., printed outline of a key) was being touched by the user""s fingers at what time as the user xe2x80x9ctypedxe2x80x9d upon the virtual keyboard.
But Korth""s method is subject to inherent ambiguities arising from his reliance upon relative luminescence data, and indeed upon an adequate source of ambient lighting. While the video signal output by a conventional two-dimensional video camera is in a format that is appropriate for image recognition by a human eye, the signal output is not appropriate for computer recognition of viewed images. For example, in a Korth-type application, to track position of a user""s fingers, computer-executable software must determine contour of each finger using changes in luminosity of pixels in the video camera output signal. Such tracking and contour determination is a difficult task to accomplish when the background color or lighting cannot be accurately controlled, and indeed may resemble the user""s fingers. Further, each frame of video acquired by Korth, typically at least 100 pixelsxc3x97100 pixels, only has a grey scale or color scale code (typically referred to as RGB). Limited as he is to such RGB value data, a microprocessor or signal processor in a Korth system at best might detect the contour of the fingers against the background image, if ambient lighting conditions are optimal.
The attendant problems are substantial as are the potential ambiguities in tracking the user""s fingers. Ambiguities are inescapable with Korth""s technique because traditional video cameras output two-dimensional image data, and do not provide unambiguous information about actual shape and distance of objects in a video scene. Indeed, from the vantage point of Korth""s video camera, it would be very difficult to detect typing motions along the axis of the camera lens. Therefore, multiple cameras having different vantage points would be needed to adequately capture the complex keying motions. Also, as suggested by Korth""s FIG. 1, it can be difficult merely to acquire an unobstructed view of each finger on a user""s hands, e.g., acquiring an image of the right forefinger is precluded by the image-blocking presence of the right middle finger, and so forth. In short, even with good ambient lighting and a good vantage point for his camera, Korth""s method still has many shortcomings, including ambiguity as to what row on a virtual keyboard a user""s fingers is touching.
In an attempt to gain depth information, the Korth approach may be replicated using multiple two-dimensional video cameras, each aimed toward the subject of interest from a different viewing angle. Simple as this proposal sounds, it is not practical. The setup of the various cameras is cumbersome and potentially expensive as duplicate cameras are deployed. Each camera must be calibrated accurately relative to the object viewed, and relative to each other. To achieve adequate accuracy the stereo cameras would like have to be placed at the top left and right positions relative to the keyboard. Yet even with this configuration, the cameras would be plagued by fingers obstructing fingers within the view of at least one of the cameras. Further, the computation required to create three-dimensional information from the two-dimensional video image information output by the various cameras contributes to the processing overhead of the computer system used to process the image data. Understandably, using multiple cameras would substantially complicate Korth""s signal processing requirements. Finally, it can be rather difficult to achieve the necessary camera-to-object distance resolution required to detect and recognize fine object movements such as a user""s fingers while engaged in typing motion.
In short, it may not be realistic to use a Korth approach to examine two-dimensional luminosity-based video images of a user""s hands engaged in typing, and accurately determine from the images what finger touched what key (virtual or otherwise) at what time. This shortcoming remains even when the acquired two-dimensional video information processing is augmented with computerized image pattern recognition as suggested by Korth. It is also seen that realistically Korth""s technique does not lend itself to portability. For example, the image acquisition system and indeed an ambient light source will essentially be on at all times, and will consume sufficient operating power to preclude meaningful battery operation. Even if Korth could reduce or power down his frame rate of data acquisition to save some power, the Korth system still requires a source of adequate ambient lighting.
Power considerations aside, Korth""s two-dimensional imaging system does not lend itself to portability with small companion devices such as cell phones because Korth""s video camera (or perhaps cameras) requires a vantage point above the keyboard. This requirement imposes constraints on the practical size of Korth""s system, both while the system is operating and while being stored in transit.
What is needed is a method and system by which a user may input data to a companion computing system using a virtual keyboard or other virtual input device that is not electrically connected to the computing system. The data input interface emulation implemented by such method and system should provide meaningful three-dimensionally acquired information as to what user""s finger touched what key (or other symbol) on the virtual input device, in what time sequence, preferably without having to use multiple image-acquiring devices. Preferably such system should include signal processing such that system output can be in a scan-code or other format directly useable as input by the companion computing system. Finally, such system should be portable, and easy to set up and operate
The present invention provides such a method and system.
The present invention enables a user to input commands and data (collectively, referred to as data) from a passive virtual emulation of a manual input device to a companion computer system, which may be a PDA, a wireless telephone, or indeed any electronic system or appliance adapted to receive digital input signals. The invention includes a three-dimensional sensor imaging system that functions even without ambient light to capture in real-time three-dimensional data as to placement of a user""s fingers on a substrate bearing or displaying a template that is used to emulate an input device such as a keyboard, keypad, or digitized surface. The substrate preferably is passive and may be a foldable or rollable piece of paper or plastic containing printed images of keyboard keys, or simply indicia lines demarking where rows and columns for keyboard keys would be. The substrate may be defined as lying on a horizontal X-Z plane where the Z-axis define template key rows, and the X-axis defines template key columns, and where the Y-axis denotes vertical height above the substrate. If desired, in lieu of a substrate keyboard, the invention can include a projector that uses light to project a grid or perhaps an image of a keyboard onto the work surface in front of the companion device. The projected pattern would serve as a guide for the user in xe2x80x9ctypingxe2x80x9d on this surface. The projection device preferably would be included in or attachable to the companion device.
Alternatively, the substrate can be eliminated as a typing guide. Instead the screen of the companion computer device may be used to display alphanumeric characters as they are xe2x80x9ctypedxe2x80x9d by the user on a table top or other work surface (perhaps a table top) in front of the companion device. For users who are not accomplished touch typists, the invention can instead (or in addition) provide a display image showing keyboard xe2x80x9ckeysxe2x80x9d as they are xe2x80x9cpressedxe2x80x9d or xe2x80x9ctyped uponxe2x80x9d by the user. xe2x80x9cKeysxe2x80x9d perceived to be directly below the user""s fingers can be highlighted in the display in one color, whereas xe2x80x9ckeysxe2x80x9d perceived to be actually activated can be highlighted in another color or contrast. This configuration would permit the user to type on the work surface in front of the companion device or perhaps on a virtual keyboard. Preferably as the user types on the work surface or the virtual keyboard, the corresponding text appears on a text field displayed on the companion device.
Thus, various forms of feedback can be used to guide the user in his or her virtual typing. What fingers of the user""s hands have xe2x80x9ctypedxe2x80x9d upon what virtual key or virtual key position in what time order is determined by the three-dimensional sensor system. Preferably the three-dimensional sensor system includes a signal processing unit comprising a central processor unit (CPU) and associated read only memory (ROM) and random access memory (ROM). Stored in ROM is a software routine executed by the signal processing unit CPU such that three-dimensional positional information is received and converted substantially in real-time into key-scan data or other format data directly compatible as device input to the companion computer system. Preferably the three-dimensional sensor emits light of a specific wavelength, and detects return energy time-of-flight from various surface regions of the object being scanned, e.g., a user""s hands.
At the start of a typing session, the user will put his or her fingers near or on the work surface or virtual keyboard (if present). Until the user or some other object comes within imaging range of the three-dimensional sensor, the present invention remains in a standby, low power consuming, mode. In standby mode, the repetition rate of emitted optical pulses is slowed to perhaps 1 to perhaps 10 pulses per second, to conserve operating power, an important consideration if the invention is battery powered. As such, the invention will emit relatively few pulses but can still acquire image data, albeit having crude or low Z-axis resolution. In alternate methods for three-dimensional capture, methods that reduce the acquisition frame rate and resolution to conserve power may be used. Nonetheless such low resolution information is sufficient to at least alert the present invention to the presence of an object within the imaging field of view. When an object does enter the imaging field of view, a CPU that governs operation of the present invention commands entry into a normal operating mode in which a high pulse rate is employed and system functions are now operated at full power. To preserve operating power, when the user""s fingers or other potentially relevant object is removed from the imaging field of view, the present invention will power down, returning to the standby power mode. Such powering down preferably also occurs when it is deemed that relevant objects have remained at rest for an extended period of time exceeding a time threshold.
Assume that now the user has put his or her fingers on all of the home row keys (e.g., A, S, D, F, J, K , L, :) of the virtual keyboard (or if no virtual keyboard is present, on a work space in front of the companion device with which the invention is practiced). The present invention, already in full power mode will now preferably initiate a soft key calibration in which the computer assigns locations to keyboard keys based upon user input. The user""s fingers are placed on certain (intended) keys, and based on the exact location of the fingers, the software assigns locations to the keys on the keyboard based upon the location of the user""s fingers.
The three-dimensional sensor system views the user""s fingers as the user xe2x80x9ctypesxe2x80x9d on the keys shown on the substrate template, or as the user types on a work space in front of the companion device, where xe2x80x9ckeysxe2x80x9d would normally be if a real keyboard were present. The sensor system outputs data to the companion computer system in a format functionally indistinguishable from data output by a conventional input device such as a keyboard, a mouse, etc. Software preferably executable by the signal processing unit CPU (or by the CPU in the companion computer system) processes the incoming three-dimensional information and recognizes the location of the user""s hands and fingers in three-dimensional space relative to the image of a keyboard on the substrate or work surface (if no virtual keyboard is present).
Preferably the software routine identifies the contours of the user""s fingers in each frame by examining Z-axis discontinuities. When a finger xe2x80x9ctypesxe2x80x9d a key, or xe2x80x9ctypesxe2x80x9d in a region of a work surface where a key would be if a keyboard (real or virtual) were present, a physical interface between the user""s finger and the virtual keyboard or work surface is detected. The software routine examines preferably optically acquired data to locate such an interface boundary in successive frames to compute Y-axis velocity of the finger. (In other embodiments, lower frequency energy such as ultrasound might instead be used.) When such vertical finger motion stops or, depending upon the routine, when the finger makes contact with the substrate, the virtual key being pressed is identified from the (Z, X) coordinates of the finger in question. An appropriate KEYDOWN event command may then be issued. The present invention performs a similar analysis on all fingers (including thumbs) to precisely determine the order in which different keys are contacted (e.g., are xe2x80x9cpressedxe2x80x9d). In this fashion, the software issues appropriate KEYUP, KEYDOWN, and scan code data commands to the companion computer system.
The software routine preferably recognizes and corrects for errors in a drifting of the user""s hands while typing, e.g., a displacement on the virtual keyboard. The software routine further provides some hysteresis to reduce error resulting from a user resting a finger on a virtual key without actually xe2x80x9cpressingxe2x80x9d the key. The measurement error is further reduced by observing that in a typing application, the frame rate requirement for tracking Z-values is lower than the frame rate requirement for tracking X-values and Y-Values. That is, finger movement in Z-direction is typically slower than finger movements in other axes. The present invention also differentiates between impact time among different competing fingers on the keyboard or other work surface. Preferably such differentiation is accomplished by observing X-axis, Y-axis data values at a sufficiently high frame rate, as it is Y-dimension timing that is to be differentiated. Z-axis observations need not discriminate between different fingers, and hence the frame rate can be governed by the speed with which a single finger can move between different keys in the Z-dimension. Preferably the software routine provided by the invention averages Z-axis acquired data over several frames to reduce noise or jitter. While the effective frame rate for Z-values is decreased relative to effective frame rate for X-values and for Y-values, accuracy of Z-values is enhanced and a meaningful frame rate of data acquisition is still obtained.
The software routine can permit the user to toggle the companion computer system from say alphanumeric data input mode to graphics mode simply by xe2x80x9ctypingxe2x80x9d on certain key combinations, perhaps simultaneously pressing the Control and Shift In graphics mode, the template would emulate a digitizer table, and as the user dragged his or her finger across the template, the (Z, X) locus of points being contacted would be used to draw a line, a signature, or other graphic that is into the companion computer system.
Preferably a display associated with the companion computer system can display alphanumeric or other data input by the user substantially in real-time. In addition to depicting images of keyboard keys and fingers, the companion computer system display can provide a block cursor that shows the alphanumeric character that is about to be entered. An additional form of input feedback is achieved by forming a resilient region under some or all of the keys to provide tactile feedback when a xe2x80x9ckeyxe2x80x9d is touched by the user""s fingers. If a suitable companion device were employed, the companion device could even be employed to enunciate aloud the names of xe2x80x9ctypedxe2x80x9d keys, letter-by-letter, e.g., enunciating the letters xe2x80x9ccxe2x80x9d-xe2x80x9caxe2x80x9d-xe2x80x9ctxe2x80x9d as the word xe2x80x9ccatxe2x80x9d was typed by a user. A simpler form of acoustic feedback is provided by having the companion device emit electronic key-click sounds upon detecting a user""s finger depressing a virtual key.