Camera based tracking of objects for human interaction with computers, in particular tracking of the hands and fingers, has attracted scientific, industrial and commercial interest over several decades. Reviews of achievements in this computational intensive field is given by Pavlovic et al., IEEE Trans. Pattern Analysis and Machine Intelligence, vol 19, No. 7, pp. 677-695, 1997, and by Zhou et al., IEEE Int. Symposium on Mixed and Augmented Reality, pp. 193-202, 2008. In many reported techniques, the objects are observed from several different viewpoints by one or more cameras to reduce a susceptibility of occlusions and for robust tracking and gesture interpretation.
For single camera-based tracking of finger touch and finger or hand gestures, features such as shadows, contours, texture, silhouette and image gradients of these objects, and even their mirror image reflected back from a glossy display surface, are extracted and utilized to update different model-based tracking systems to compute the finger's or hand's posture and to detect, for example, finger touching in real-time.
As an example of clever feature extraction, a published US patent application no. US2010/0066675A1 describes a single camera imaging touch screen system and feature extraction based on an observation that a shadow from a finger illuminated by a sideways illuminant is ultimately obscured by the finger when touching the screen, such that the shadow resembles a finger when not touching, while the shadow is narrowed substantially when the finger is touching the surface such that touch can be determined. The application includes an independent claim, however, which is anticipated by a public scientific article from 2005 by the inventor Andrew D. Wilson (ACM Proc. UIST' 2005, pp 83-92).
Earlier published patent applications nos. WO9940562 (A1), US006100538A and US2010188370 (A1) in principal describe object tracking systems utilizing finger touch or pen input, wherein at least two camera viewpoints are disposed at a periphery of a coordinate plane to determine coordinates of an object, for example a pointing finger, by triangulation.
A published international PCT patent application no. WO9940562 (A1) describes a system for detecting pen and finger touch in front of a computer monitor screen by using a single camera and by a periscope-like optical system consisting of one or several flat mirrors, recording two images of the screen looking sideways into a volume immediately in front of the screen, to determine the pen's or finger's coordinates and distance to the screen.
A published US patent application no. US006100538A describes an optical digitizer for determining a position of a pointing object projecting a light and being disposed on a coordinate plane, and a detector disposed on a periphery of the coordinate plane. Preferably, a pair of linear image sensors has a field-of-view covering the coordinate plane to function as a detector, and a collimator is disposed to limit a height of the field-of-view of the detector. The detector is operable to receive only a parallel component of the light which is projected from the pointing object substantially in parallel to the coordinate plane, and a shield is disposed to block noise light other than the projected light from entering into the limited view field of the detector. A processor is provided for computing the coordinates representing the position of the pointing object.
A published US patent application no. US2010188370 (A1) describes a camera-based touch system including at least two cameras having overlapping viewing fields, placed along the periphery and typically in the corners of a touch surface to detect a position of a pointer by triangulation, and to detect the pointer touch and pointer hover above the touch surface.
A granted Chinese patent no. CN201331752 describes a multi-point touching system for a transparent display. The system employs an illumination of a projection light plane from the rear side with infrared light and operates by observing the illumination of finger tips by an infrared sensitive CCD camera from the rear side.
A granted European patent no. EP1336172 describes an input device apparatus and method of detecting and localizing interactions of user-objects with a virtual input device. The apparatus is adapted to operate with hand-held devices based on a detection of objects penetrating a light plane.
In a published Korean patent application no. KR2010 0109420A (Dongseo Technology Headquarters), there is described an interactive display system of a multi-touch base for freely performing interaction with contents. The system is arranged to provide an interactive area on a screen to facilitate an interaction function with contents presented at the interaction area. Moreover, the system employs an Infra-Red (IR) light emitting diode (LED) array bar to generate an interactive layer illuminated by IR radiation. An IR camera is employed in the system to image reflected IR radiation from a human body or an object touching the interactive layer. A server including computing hardware is employed to compute coordinate values of the interaction position of the human body or object from signals generated by the IR camera.
In a published international PCT patent application no. WO 02/054169A1, there is described a data input device and an associated method. The device includes an illuminator which is operative to illuminate at least one engagement plane by directing light along the at least one engagement plane. Moreover, the device includes a two-dimensional imaging sensor viewing at least one engagement plane from a location outside the at least one engagement plane for sensing light from the illuminator scattered by engagement of a data entry object, for example a user's finger, with the at least one engagement plane. Furthermore, the device includes a data entry processor for receiving an output from the two-dimensional imaging sensor and providing a data input to utilization circuitry.
In a published international PCT patent application no. WO 20041072843A1, there is described a touch screen which uses light sources at one or more edges of the screen. The one or more light sources direct light across a surface of the screen. There is also included two cameras having electronic outputs, wherein the two cameras are located at a periphery of the screen to receive light from the light sources. A data processor is also included to receive the outputs of the two cameras and is operable to execute one or more software products for performing triangulation computations for determining one or more locations of one or more objects in proximity to the screen. Detecting the presence of an object includes detecting at the two cameras the presence or absence of direct light due to the object, using the surface of the screen as a mirror. The cameras are employed to detect the presence or absence of reflected light due to the object at the surface. Optionally, the light sources are modulated to provide radiation at the two cameras in a sensitive radiation band width of the two cameras.
In general, it is important that a user's intentions and commands are correctly recognized in man-machine interaction systems. An accuracy of object position detection in respect of X and Y ordinates in a coordinate plane employed may, or may not, be important depending upon circumstances, namely is dependent upon application. Consequently, finger touch systems are attractive where, for example, modest accuracy is required for moving or selecting graphical objects or accessing menus, whereas a stylus or a pen is preferred when an highest accuracy is required, for example for applications concerned with fine writing or drawing, or handling details and objects in CAD-programs. Therefore, in a finger based system, feature extraction and robust heuristics for the determination of the finger's coordinates may be sufficient, based on a two-dimensional image from a single camera.
However, for all types of applications, high precision related to detection of finger or pen touching is of outmost importance, and must never fail, because then the user may lose control over the application. A high and constant detection quality of the touching condition is therefore required in every position in a coordinate plane which is utilized. The detection method should furthermore not be susceptible to variations in finger size, skin color, ambient light conditions, display light and so forth, and the detection should be fast and uniform over the coordinate plane, and without any user-dependent behavior or delay penalty occurring.
There is a great contemporary interest in interaction systems using pen, touch or both (dual-mode systems) for education, collaboration and meetings. Several new interaction platforms also allow simple pen or finger gesture control, and/or even hand gesture based interaction. Specifically, there is a great global interest in interactive tablets and whiteboards for use within education both in normal classrooms and in large lecture halls. Such whiteboards are also entering contemporary meeting rooms, video conferencing rooms and collaboration rooms. Images on an interactive whiteboard's coordinate plane may be generated as a projected image from a short-throw or long-throw data projector, or by a flat screen; the flat is implemented, for example, as a LCD device, a plasma display, OLED device or a rear-projection system. It is important that the input device for touch and/or pen can be used together with all types of display technologies without reducing the picture quality or wearing out associated equipment. It is furthermore important that input device technology can be easily adopted to different screens, projectors and display units with low cost and effort.
New interactive whiteboards are commonly equipped with short-throw projectors, for example projectors with an ultra wide-angle lens placed at a short distance above an associated screen. Such a manner of operation results in the user being less annoyed by light into his/her eyes and will tend to cast less shadows onto the screen, and the projector can be mounted directly onto a wall together with the whiteboard. An ideal input device for pen and touch for such short-throw systems should therefore be integrated into or attached alongside the wall projector, or attached to the projector wall mount, to make installation simple and robust.
In lecture halls, very long interactive whiteboards and interaction spaces are required, and these interaction surfaces should beneficially provide touch, pen and gesture control. On large format screens, pointing sticks and laser pointers are often required to draw the public's attention. The preferred input technology should be apt to all such diverse requirements, namely should also accept pointing sticks and lasers as a user input tool, and be tolerant to and adaptable to different display formats.
Moreover, flat screen technologies may need touch and/or pen operation, simple pen and/or touch gesture interaction, and ultimately hand gesture control. Touch sensitive films laid on top of a flat screen cannot detect hovering or in-the-air gestures. Pure electro-magnetic pick-up systems behind a flat screen cannot detect finger touch or finger gestures, only pen operation is possible. However, some types of flat display technologies, in particular OLED displays, can be transparent, thus camera-based technologies can be used for gesture control through the screen. If dual-mode input systems including hovering and gestures continue to become increasingly important and standardized for providing an efficient and natural user interface, optically based input systems will thus be preferred also for flat interactive screens instead of capacitive or resistive films or electro-magnetic based solutions. Therefore, the preferred input device technology should be optically based and should be suitable to adapt to both conventional flat screens (LCD, plasma, LED) and transparent flat screens like the OLED and rear-projection screens.
Input devices should not be susceptible to light sources as daylight, room illumination, the light from the projector or display screen and so forth. Furthermore, input devices should not be susceptible to near infra-red radiation from sunlight, artificial light or remote control units or similar which utilize near infrared light emitting diodes for communication. Moreover, the input devices should further exhibit a high coordinate update rate and provide low latency for achieving a best user experience.
Input devices should preferably be adaptable to fit into existing infrastructure, for example to upgrade an existing installed pen-based interactive whiteboard model to also allow finger touch and hand gesture control, or to upgrade a meeting or education room equipped already with an installed projector or flat screen, or to become interactive by a simple installation of the input device itself.
In some scenarios, input technology can even be usable without interactive feedback onto the writing surface itself, for example by capturing precisely strokes from a chalk and sponge on a traditional blackboard and recognizing hand gestures for control of a computer; or by capturing normal use of pen and paper (including cross-outs) and simple gestures for control of the computer; or by capturing the user's information by filling in a paper form or questionnaire including his/her signature, while the result is stored in a computer and the input or some interpretation of the input is shown by its normal computer screen or by a connected display or a projector for the reference of the user and the audience. This means that the input device should be possible to use stand-alone or separated from costly display technology in cases where this type of infrastructure is not available or needed.
In the same way that interactive whiteboards are replacing traditional chalk and blackboard in educational establishments, novel interaction spaces are emerging in other arenas. Multi-user interactive vertical and horizontal surfaces are introduced in collaborative rooms and control rooms, museums and exhibitions. Moreover, interactive spaces including interactive guest tables are established in contemporary commercial premises such as bars, casinos cafés and shops, to make it possible for guests to select from a menu, order and pay, as well as receiving entertainment, for example by playing computer games, browsing the internet or reading news reports.
However, a contemporary problem arises is that input devices for monitoring touching and/or hovering movements in an interaction space are not sufficiently accurate and develop to address needs of many information input and display systems. The present invention is devised to at least partially address these contemporary problems.