This invention is in the field of interactive display systems. Embodiments of this invention are more specifically directed to such display systems, and methods of operating the same, in which the user interacts with displayed content using a remote hand-held device.
The ability of a speaker to communicate a message to an audience is generally enhanced by the use of visual information, in combination with the spoken word. In the modern era, the use of computers and associated display systems to generate and display visual information to audiences has become commonplace, for example by way of applications such as the POWERPOINT presentation software program available from Microsoft Corporation. For large audiences, such as in an auditorium environment, the display system is generally a projection system (either front or rear projection). For smaller audiences such as in a conference room or classroom environment, flat-panel (e.g., liquid crystal) displays have become popular, especially as the cost of these displays has fallen over recent years. New display technologies, such as small projectors (“pico-projectors”), which do not require a special screen and thus are even more readily deployed, are now reaching the market. For presentations to very small audiences (e.g., one or two people), the graphics display of a laptop computer may suffice to present the visual information. In any case, the combination of increasing computer power and better and larger displays, all at less cost, has increased the use of computer-based presentation systems, in a wide array of contexts (e.g., business, educational, legal, entertainment).
A typical computer-based presentation involves the speaker standing remotely from the display system, so as not to block the audience's view of the visual information. Often, the speaker will use a pointer, such as a laser pointer or even a simple wooden or metal pointer, to non-interactively point to the visual information on the display. In this type of presentation, however, the speaker is essentially limited to the visual information contained within the presentation as generated, typically as displayed in a sequential manner (i.e., from one slide to the next, in a given order).
However, because the visual presentation is computer-generated and computer-controlled, an interactive presentation can be carried out. Such an interactive presentation involves selection of visual content of particular importance to a specific audience, annotation or illustration of the visual information by the speaker during the presentation, and invocation of effects such as zooming, selecting links to information elsewhere in the presentation (or online), moving display elements from one display location to another, and the like. This interactivity greatly enhances the presentation, making it more interesting and engaging to the audience.
In conventional display systems used before an audience, however, the speaker must generally be seated at the computer itself in order to interactively control the displayed presentation content by operating the computer. This limitation can detract from the presentation, especially in the large audience context.
The ability of a speaker to interact, from a distance, with displayed visual content, is therefore desirable. More specifically, a hand-held device that a remotely-positioned operator could use to point to, and interact with, the displayed visual information is therefore desirable. Of course, in order for such a device to function interactively, the computer-based display system must discern the location on the display that the device is pointing to, in order to comprehend the operator command.
As known in the art, conventional “light pens” provide hand-held interaction with a display at a distance. In these devices, the pointed-to position on a cathode-ray-tube display (CRT) is detected by sensing the time at which the pointed-to pixel location on the display is refreshed by the CRT electron gun. This sensed time is correlated with the raster-scanning sequence of the CRT display, to determine the screen location at which the light pen sensor is aimed. Of course, this light pen sensing technology is limited to CRT displays, because of its dependence on sensing the raster-scan timing.
U.S. Pat. No. 5,933,135 describes another conventional type of hand-held pointing device, which is useful in connection with modern liquid-crystal displays (LCDs) and other similar flat-panel display technologies (e.g., plasma displays, light-emitting diode (LED) displays, etc.). This pen device includes an imager, such as a camera, that captures an image of a portion of the display screen, including a visible cursor. The location, size, and orientation of the detected cursor is forwarded to a host processor, which compares the detected cursor with the displayed image, and deduces the relative orientation and position of the pen device relative to the display. According to this and similar approaches described in U.S. Pat. No. 7,513,700 and in U.S. Pat. No. 7,161,596, the pointing device captures some or all of the displayed image, not necessarily including a cursor element. Comparison between the captured image and the image being displayed enables deduction of the relative position of the center of the camera image in to the displayed image.
These conventional pointing devices necessarily constrain the displayed image, however, requiring either a displayed cursor element or perceptible image content varying within the displayed image in order to determine the pointed-to location of the positioning device. In some situations, the presence of a cursor may be distracting or superfluous to the viewing audience. In addition, these devices would be precluded from use in a “white board” context, for example if the user wished to hand-write or draw an image on an otherwise-blank display field. In addition, this approach involves a large amount of data processing and bandwidth to communicate and compare the captured image with the displayed image.
U.S. Pat. No. 7,420,540 describes a pointing device useful for a user that is relatively distant from the display, and that does not necessarily rely on the content of the displayed image for positioning. When actuated, this pointing device captures an image from a distance, including one or more corners of the display, and perhaps the center of the display, and forwards that captured image to a controller. The controller then determines the positioning and relative movement of the pointing device from the location and orientation of the display corners in the images captured by the pointing device. However, this approach is limited to those situations in which the user is sufficiently distant from the display that the field of view of the pointing device includes one or more display corners, and would appear not to operate well in smaller environments or in those situations in which the speaker is near a large display.
According to another approach, the displayed image includes positioning information that is perceptible to a pointing device, but not directly perceptible to a human viewer. U.S. Pat. No. 7,553,229 describes a gaming display system in which crosshair navigation frames are interleaved among the sequence of displayed image frames. These crosshair navigation frames are described as including a relatively dark image pattern that, because of the relatively few number of navigation frames interleaved within the frame image sequence at a relatively high frame rate, would not be directly perceptible to the human viewer. In this system, the gaming device (i.e., pointing device) captures a video sequence including two or more of the navigation frames; the system then identifies the navigation frames by pattern recognition, and cross-correlates the identified frames to determine a “cross-hair” position of the display at which the pointing device is aimed. Rapid positioning of the device in this system requires a high duty cycle of navigation frames within the displayed sequence, which reduces the brightness and contrast of the displayed gaming images to the human user; conversely, less human-perceptible impact can be attained at a cost of longer positioning detection times and thus reduced game performance.
Other positioning approaches involving human imperceptible image information for location of a pointing device are described in U.S. Pat. No. 7,421,111. According to one described approach, a sequence of positioning pattern frames are encoded by pixel or location, displayed, and sensed over time; the pointed-to location is then deduced from the sensed sequential code. Human perceptibility of these positioning patterns is reduced by use of infrared light for the patterns, or by using an extremely high frame rate for display of visible-light positioning patterns within the overall displayed images (requiring either a high-speed projection technology, or a separate projector). In another approach described in this patent, multiple sensors are provided within the pointing device to enlarge the sensed area of the display, and capture a complete code within a single frame; this approach appears to limit the distance between the positioning device and the displayed image, in order for proper decoding and positioning.
By way of further background, an “augmented reality system” is described in Park et al., “Undistorted Projection onto Dynamic Surface”, Advances in Image and Video Technology (2006), pp. 582-90. In this described system, images are projected onto actual real-world objects and surfaces to enhance the viewer's perception of his or her environment, with geometric correction applied to the displayed image to account for variations in the surface of the object from flat. This geometric correction is described as based on a comparison of camera-captured image data with the image data to be displayed. In this article, the projection system overlays the displayed image with pattern image variations in brightness or color in successive frames. The pattern images in successive frames (i.e., the variations) sum to zero, and are thus imperceptible to the human viewer. The camera-captured images in successive frames are subtracted from one another for purposes of geometric correction, canceling the displayed image data but recovering the pattern image. Comparison of the recovered pattern image as viewed by the camera with the data to be displayed enables geometric correction of the displayed images.
By way of further background, interactive projector systems of the digital micromirror device type are known. In these systems, a separate high-speed modulated light source, or alternatively an color wheel segment, is used to project positioning information at the display that are invisible to human viewers but detectable to an camera. These systems are of course limited to digital micromirror-based modulator projection display systems, and typically require additional projection subsystems (including an additional modulator) to operate with reasonable performance.