This invention relates to the field of machine vision, human-computer interfaces and computer peripherals.
Interaction between a human and a computer has taken many forms over time, including early punch cards and leading all the way to modern graphical user interfaces. The GUI or Graphical User Interface has become a standard and the associated pointer device which allows a user to achieve computer actions through selections of the GUI has become a standard along with it. This pointer device takes the form of a computer mouse, light pen, track ball, touch screen, digitizer pad, and other technologies known in the art.
Human to human communication can take advantage of a GUI. A person who creates a media presentation through the computer's display can better communicate a thought or topic with another person or group of persons. Evidence of this is clear in a medium such as the world wide web which essentially allows one party to prepare media content for communication with another party. Similarly, computer-assisted education has become a favorite of corporations and universities who need to distribute a lot of information to a large group in a short time.
Another common use of technology for communication is the combination of a human and computer screen in a presentation being delivered to one or more people. Software programs commercially available from corporations and individuals other than the creators of the present invention are dedicated to this express purpose and provide varying degrees of animation, sound and diverse media that can be used by a human presenter to augment or enhance the communicated presentation that would otherwise be provided by the human alone. This is essentially the automation of a task previously performed through “transparencies” or “foils” presented using an “overhead” projector as known in the art. Common uses of these tools today include sales presentations, education and training presentations, status briefings—virtually any situation in which the presenter deems it is needed. In fact, this practice has lead to wide availability of complementary hardware technologies, such as computer projectors which replace or augment the computer's normal display with an enlarged “screen” that can be viewed by a wider audience.
When a human presenter is addressing an audience while showing a computer-generated media sequence, the human should have some control over the computer—such as what to show and when. In one sense, the timing of the presentation could be worked out in advance and essentially “choreographed” so that human and computer are always presenting related information. This proves difficult because of the expense of creating such presentations (many rehearsals needed, memorization time for the human) and the inflexibility of the result. Similarly, a second human presenter could monitor the first presenter and essentially “drive” the computer. This is expensive and unfeasible in a number of scenarios where it is simply not possible to provide two presenters, such as a standard classroom or sales presentation. The most common solution is for the human presenter to distract his or her attention from the presentation momentarily and direct it at governing a change in the sequence of events presented by the computer as needed to complement what the human is expressing.
Commercial software programs allow for this interaction to take place through the computer's standard pointer device or computer keyboard. A number of manufacturers have created devices that allow the pointer to function wirelessly so that the presenter is free to move around as is typical in presentations. These devices include gyroscopic motion detectors, wireless track-balls, and others. No single solution has become widespread in its adoption.
A parallel problem faced by a human presenter as a series of media is displayed to a group is drawing attention to specific portions of the media as each portion is discussed. In a scenario where rich content is displayed, including graphics, text, animation, etc. on a single screen and while the presenter speaks and answers questions, it is often necessary to highlight specific aspects of the media presentation in order to provide guidance to the audience. Features on a graph, specific words or text messages, etc. that are present on the media display can be emphasized or highlighted in order to add to the experience, enhancing attention and therefore retention of the message. Unlike the problem of controlling the computer system, this parallel problem of audience guidance has a nearly ubiquitous solution in the form of the LASER pointer. This inexpensive device replaces a “pointer stick” as was used previously and essentially allows the presenter to easily place a point of distinctively shaped, brilliant or colored light overlaid with the media presentation at will. Simple gestures can then turn in to an underline effect, a circle around a target feature, etc. This process is very simple and highly effective as the speaker/presenter merely has to aim a hand-operated pen-sized LASER as are inexpensive and commonly commercially available.
The problem of controlling the computer presentation in tandem with providing audience guidance has been addressed in the art. To this extent it has been identified as a distinct problem whose solution would be useful. It has been evident to a number of inventors that some combination of the LASER pointer and the computer control should result in a solution.
Bronson in U.S. Pat. No. 5,138,304 from 1992 discloses a “PROJECTED IMAGE LIGHT PEN.” Bronson's invention uses a camera to observe a projected computer screen, looking for differences between the camera input and the computer's video memory “frame buffer.” These differences are then analyzed to determine where the laser pointer is currently aimed, and from that information control signals to the computer are generated. Bronson's first disadvantage is that the method does not allow for detection of motion of the laser pointer over time, such as would be required to allow the presenter complex control signals. Examples of such control signals include shapes as in letters for input or glyphs representing arrow keys. More specifically, Bronson's invention is only capable of detecting point targets rather than other geometrical forms variable in time or space.
Bronson's invention also fails to provide control signals generated by laser pointer interaction with off-screen items such as fixed position “buttons” that a user might choose to display permanently. Examples include a periodic chart that is on the wall of a classroom or standard Forward/Stop/Backward “VCR” controls for animations. If these items are not on the computer screen (because they are physically drawn on the projection surface or in the surrounding environment) then they cannot be detected by comparing video memory, as Bronson suggests, to anything because they are not stored in video memory as part of the video presentation. Other composite video applications that change the computer's display output without affecting the video frame buffer would cause technical difficulties that Bronson does not overcome. These include such applications as on-screen display that is commonly used to provide screen and sound adjustments as well as video playback that is the result of a hardware-decoded stream of digital video data placed directly into display output without changing video memory as is common in the art.
Furthermore, Bronson does not disclose the use of a networked peripheral camera such as a VMEBus camera which was known in the art at the time of disclosure. This omission detracts significantly from the commercial viability of Bronson's solution as it makes the required hardware more complex, less portable not modular for adaptation of ever more modern sensor devices. For this reason, Bronson's system is unusable in laptop computers void of frame-grabbing hardware expansion interfaces.
Bronson's largest disadvantage is the digital signal processing intensity that is required to implement the solution in the embodiment claimed. Because the algorithm relies on instantaneous differences between the video frame buffer and the input from a camera, every camera frame must be converted into pixel coordinates. This is a per-pixel multi-floating-point computation which requires several steps and grows in complexity as the quality desired increases. In order to provide adequate response time, Bronson's algorithm is likely to use most of the CPU cycles for the system on which it is run even taking into account advances for the foreseeable future.
Wissen in “ERCIM News” No. 46, July 2001, discloses “Implementation of a Laser Based Interaction Technique for Projection Screens.” Wissen requires a specialized computer system and three video cameras. Furthermore, the display implemented is a specialized type and the described system is not adaptable to standardized PC products. For these reasons it is not a feasible solution for mass market applications that require a single camera on a PC hardware platform
Platzker in U.S. Pat. No. 5,528,263 from 1996 discloses an invention “INTERACTIVE PROJECTED VIDEO IMAGE DISPLAY SYSTEM.” Platzker references light based systems such as Bronson but does not disclose an invention using them. Rather Platzker implements a system in which the presenter's body or body parts act as cues to the computer system by overlapping with portions of a projected display. This requires that the screen be within reach of the presenter and is disadvantageous in that it is distracting to the audience for the presenter to turn away from the them in order to see the presented image and move to occlude portions of it. Platzker does not sort out the problem of shadows created by the presenter without intent to obstruct but which would trigger control signals spuriously. Platzker's greatest disadvantage is that the invention requires predetermined screen regions containing control characteristics. This does not allow for gesture recognition or for free-form input to the computer's pointer device interface so that any software package can act on the control signals or for different controls signals based on the same input but at different times. Aside from these faults, Platzker, as Bronson, requires dedicated video expansion hardware to capture camera input rather than working from a networked camera.
IC Technologies manufactures a commercial product, LightMouse, comprising a camera that monitors the movement of a laser pointer and provides click controls. A distinct disadvantage of this product is the requirement that the camera “see” the entire projected display in order to properly calibrate and function. This is a debilitating weakness in an environment where the projection screen being monitored is slightly smaller than the projected image, as occurs frequently when configuring portable displays in the field. Furthermore, the IC Technology solution is unable to provide control signal inputs for activities outside the field of view of the camera, a weakness shared with others of the solutions discussed. Even once it is working, the tool is only able to provide mouse-like inputs to the computer system and is otherwise un-programmable and limited in its use. LightMouse algorithms are not sufficiently robust and it will “manufacture” a laser pointer even if one is not on the field of vision. Its algorithms also do not account for the realities of speed of a human driving a pointer, as detected mouse coordinates may jump hundreds of pixels back and forth many times a second. LightMouse also requires the user to tweak the camera parameters to highly specific contrast, brightness, and saturation settings. This is a distinct disadvantage to a device that will be used by a board user group with varying degree of technical expertise.
Richard R. Echert and Jason A. Moore discuss “AN INTERACTIVE, REMOTE-CONTROLLED COMPUTER PROJECTION SYSTEM FOR USE IN A LARGE CLASSROOM ENVIRONMENT” www.cs.binghamton.edu/-reckert/icLedoc.html, and “The Classroom of the 21st Century: The Interactive Learning Wall”—SIGCHI Bulletin, Volume 32, Number 2—April 2000. This disclosure targets remote learning as embodied in presenters in one location addressing audiences in a separate physical location. To this extent, the disclosure requires a fixed geometry environment including a rear-projection system with a camera mounted as close to the projection source as possible in order to function. These are limitations that render the solution impractical for traveling or portable applications, or for all but the largest presentation forums. These limitations are due to intolerance of operator shadows overlapping the projected image, a common occurrence in practice. A restriction shared with other elements of the art is the presence of a perpetually visible window on the screen which is necessary for this device to function. The presence of this window overlays the presentation material that is being delivered by a user of this system, distracting the audience. Most detracting in this disclosure is the use of a single statistic—the brightness of relative camera input pixels—to detect a laser input. This limitation makes many assumptions about the environment including contents of the presenter's material, which cannot contain points as bright as the laser. This is particularly debilitating when a CCD camera is in use, which by nature of the sensor has low saturation and therefore high sensitivity. In addition the system is not likely to work in bright lighting conditions that would decrease the contrast acquired by the camera. An effect as simple as a decrease in the battery charge of the presenter's hand held pointer would incapacitate the system disclosed.
Drumm in U.S. Pat. No. 4,808,980 discloses an invention “ELECTRONIC LIGHT POINTER FOR A PROJECTION MONITOR.” Drumm's solution scales-up in size the standard raster-scan detection process of a desktop light-pen. This requires significant hardware complexity such as high speed electronics that are not commonly available in the consumer market space. This disadvantage is in addition to the fact that the presenter using Drumm's solution has to position a physical device at the right location in a projector's output light stream. The later requires that the presenter have a pole or a ladder proportional in size to the projection screen such that the light-pen sensor can be positioned. Clearly the configuration is neither convenient nor portable and represents a cumbersome distraction to the audience.
Schlossberg in U.S. Pat. No. 4,280,135 discloses an invention “REMOTE POINTING SYSTEM” which orients and triggers a remote laser based on a local one seen through a camera. Schlossberg solves a problem similar to Echert and Moore relating to a presenter that is a separate physical location from the audience. Schlossberg, however monitors the movement of a laser pointer with a camera device and transmits the position and activity information to a remote system where it can be reproduced. The remote system contains a second laser and a mechanical orientation transducer which are provided with the presentation activity information to re-create the pointer presence that is observed by the camera. This invention does not provide control information to the presentation computer system, and in fact does not assume that a computer is even in use for the presentation. It has no applicability to local presentation scenarios as its usefulness stems from the ability to send information remotely. It is an enhancement to the presenter's person to person capability without addressing the parallel heretofore unsolved problem of presenter control over the computer media presentation.