Due to their inherent portability, mobile devices like smartphones and Apple Watch have converged to a relatively small form factor. On the other hand, however, mobile devices present a whole new class of design challenges due to their small size. In particular, interacting with small mobile devices, such as text-entry on mobile devices, involves many challenges.
Therefore, recent research work has been conducted on redesigning visual keyboards for text entry on mobile devices, such as wearable keyboards, modified on-screen keyboards, and projection keyboards.
Wearable keyboards are designed to allow a user to input text on mobile devices, for example, a ring put on each finger to detect the finger's movements based on an accelerometer or a gyroscope, a glove equipped with a pressure sensor for each finger, or two rubber pads slipped onto at least one hand typing to sense the movements in the user's palm. Moreover, modified on-screen keyboards adaptively change the sizes of keys on a screen, use the information about a user's hand posture to improve the usability of the text entry system, or utilize a touch sensor on the back side of a device to enable ten-finger touch typing. Further, to take advantage of the traditional QWERTY keyboard layout, various projection keyboards are proposed. These visual keyboards normally use a visible light projector or an infrared projector to cast a keyboard, and then use optical ranging or image recognition methods to identify keystrokes. In addition, UbiK (see J. Wang, K. Zhao, X. Zhang, and C. Peng, “Ubiquitous keyboard for small mobile devices: Harnessing multipath fading for fine-grained keystroke localization,” in Proc. of ACM MobiSys, 2014) uses a microphone on a mobile device to localize keystrokes.
However, there are still issues with these redesigned keyboards. For example, the wearable keyboards require additional equipment, the on-screen keyboards only support single-finger text entry and usually take up a large area on the screen, and the projection keyboards need an infrared or visible light projector to display a keyboard. The UbiK requires users to click keys with their fingertips and nails to make audible sounds, while users normally use their finger pads, instead of finger nails, to type on a keyboard. Moreover, generated audio sounds might be buried in noises.
In view of the above issues, example embodiments of the technology described herein provide input systems comprising a camera-based keyboard, which works with a front-facing camera on a mobile device, and methods for inputting text on mobile devices using a camera-based keyboard. FIG. 1 shows a non-limiting example embodiment of a camera-based keyboard for text-entry on a mobile device. In FIG. 1, the example embodiment uses only the camera of a mobile device 100 and a paper keyboard 110, both of which can be easily carried. In particular, the camera on the mobile device 100 takes pictures while a user is typing on the paper keyboard 110, and then keystrokes are detected and localized based on the captured pictures 120.
In certain example embodiments of the technology described herein, keystrokes are detected and localized with high accuracy, and the corresponding characters of pressed keys are outputted without any noticeable time latency. By using image processing technology, at least a portion of the keys of a keyboard are extracted, a fingertip can be tracked, and a keystroke can be detected and localized. Moreover, in some example embodiments, an initial training may be conducted to enhance the image processing results, and/or online calibration may be used to further reduce the false positives of keystrokes. Additionally, in some example embodiments, time-consuming modules are optimized for running on mobile devices.
In accordance with an example embodiment, an input system is provided to allow a user to interact with a mobile device, such as to input text into the mobile device, via a keyboard including a plurality of keys. The keyboard may simply be printed on a piece of panel, like a paper. The input system comprises a processor system including at last one processor. The processor system is configured to at least capture a plurality of images regarding the keyboard and at least one hand typing on the keyboard via the camera. Based on the plurality of captured images, the processor system is further configured to locate the keyboard, extract at least a portion of the keys on the keyboard, extract a hand typing, and detect a fingertip of the extracted hand. After that, a keystroke may be detected and localized through tracking the detected fingertip in at least one of the plurality of captured images, and then a character corresponding to the localized keystroke may be determined.
To illustrate an example implementation of detecting and localizing keystroke based on image processing techniques, the observations of a keystroke will be first described. FIGS. 2(a)-2(e) show the example frames/images 210, 218, 230, 240, and 250 captured by a camera during two consecutive keystrokes. The coordinates of these images are shown in FIG. 2(a). The origin 211 of the coordinates is located in the top left corner of these images. In the embodiment, as shown in FIG. 2(a), the hand 212 located in the left area of the image is referred as the left hand, while the other hand 213 is referred as the right hand. In FIG. 2(a), the fingertip 214 pressing a key is referred as a StrokeTip, and the key 215 pressed by a fingertip is referred as a StrokeKey. From the left to the right, the fingers 219-228 in FIG. 2(b) are referred as finger i, iε[1, 10], respectively.
Both FIGS. 2(a) and 2(d) show that a StrokeTip (e.g., 214 or 241) is located on a StrokeKey (e.g., 215 or 242), in accordance with certain example embodiments. In particular, FIG. 2(a) illustrates that a StrokeTip (e.g., the StrokeTip 214) has the largest vertical coordinate among all of the fingers of the same hand. As shown in FIG. 2(a), the vertical distance dr between the StrokeTip 214 and the remaining fingertips of the right hand is larger than that of the left hand (dl). However this feature may not work very well for thumbs, which would need to be identified separately. Moreover, in considering the difference caused by the distance between a camera and a fingertip, in some cases, this feature may not be satisfied. This feature therefore is only used to assist the localization of a keystroke, but not to directly determine a keystroke.
FIGS. 2(b) and 2(c) show that, before pressing a key, a user keeps moving one of his/her fingers towards a target key, i.e., key 229 in FIGS. 2(b)-2(d). When the user is pressing the target key 229, the corresponding StrokeTip 241 stays on that key for a period of time, as shown in FIG. 2(d). If the position of the fingertip remains the same for a predetermined period of time, a keystroke may happen. FIG. 2(d) shows that the StrokeTip 241 obstructs the StrokeKey 229 from the view of the camera. The ratio of the visually obstructed area to the whole area of a key may be used to verify whether the key is pressed.
Viewed from a second aspect, the present invention provides a method of using a keyboard for allowing a user to interact with a mobile device, comprising capturing a plurality of images in connection with the keyboard and at least one hand typing on the keyboard via the camera, locating the keyboard based on at least one of the plurality of captured images, extracting at least a portion of the keys on the keyboard based on at least one of the plurality of captured images, extracting a hand based on at least one of the plurality of captured images, detecting a fingertip of the extracted hand based on at least one of the plurality of captured images, detecting and localizing a keystroke through tracking the detected fingertip based on at least one of the plurality of captured images; and determining a character corresponding to the localized keystroke.
Viewed from a third aspect, the present invention provides an non-transitory computer-readable storage medium storing a text-entry program, the text entry program being executable by a processor system including at least one processor, wherein the text entry program allows the processor system to execute: capturing a plurality of images in connection with a keyboard and at least one hand typing on the keyboard via a camera, locating the keyboard based on at least one of the plurality of captured images; extracting at least a portion of the keys on the keyboard based on at least one of the plurality of captured images; extracting at least a hand based on at least one of the plurality of captured images; detecting a fingertip of the extracted hand based on at least one of the plurality of captured images; detecting and localizing a keystroke through tracking the detected fingertip based on at least one of the plurality of captured images; and determining a character corresponding to the localized keystroke.
Example issues addressed by the technique of the exemplary embodiment are described as follows:
(1) High Accuracy in Keystroke Localization:
The accuracy of keystroke localization may not be sufficiently high due to an inter-key distance of about two centimeters, or a positional difference between a real fingertip and a detected fingertip. To address this issue, in certain example embodiments, hand detection results are optimized by adopting erosion and dilation technologies. In an example embodiment, for each fingertip, a small hit box area is generated around a detected fingertip position to represent a corresponding fingertip, and for each key, a visually obstructed area of the key is calculated to verify whether the key has been pressed.
(2) Low False Positive in Keystroke Detection:
A false positive occurs when a non-keystroke (i.e., no key is pressed by any fingertip) is incorrectly treated as a keystroke. Therefore, in certain example embodiments, a keystroke detection process is combined with a keystroke localization process, more particularly, if there is an invalid key pressed by a fingertip, the potential non-keystroke would be removed. Moreover, in some example embodiments, online calibration technology may be introduced to further reduce false positives in keystroke detection.
(3) Low Latency:
Ideally, when a user presses a key on a paper keyboard, the character of the key should be identified without any noticeable time latency. However, image processing is usually computationally heavy and hence time consuming, especially when being run on small mobile devices. To address this challenge, in certain example embodiments, image sizes are reduced, an image processing process is optimized, multi-threads are adopted, and/or the operations of writing into and reading from image files are eliminated.
Example embodiments of the technology described herein allow users to type in on a paper keyboard with all of their fingers and provide a user experience similar to that of a traditional physical keyboard. In an example embodiment, a keyboard may also be printed/drawn on any other panels. These example embodiments can be used in a wide variety of scenarios, for example, anywhere there is sufficient space to put a mobile device and a paper keyboard, e.g., offices, coffee shops, outdoor environments, etc.
Other aspects, features, and advantages of this invention will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, which are a part of this disclosure and which illustrate, by way of example, principals of this invention.