Two-way video systems are available that include a display and camera in each of two locations connected by a communication channel that allows communication of video images and audio between two different sites. Originally, such systems relied on setup at each site of a video monitor to display a remote scene and a separate video camera, located on or near the edge of the video monitor, to capture a local scene, along with microphones to capture the audio and speakers to present the audio thereby providing a two-way video and audio telecommunication system between two locations.
Referring to FIG. 1, a typical prior art two-way telecommunication system is shown wherein a first viewer 71 views a first display 73. A first image capture device 75, which can be a digital camera, captures an image of the first viewer 71. If the image is a still digital image, it can be stored in a first still image memory 77 for retrieval. A still image retrieved from first still image memory 77 or video images captured directly from the first image capture device 75 will then be converted from digital signals to analog signals using a first D/A converter 79. A first modulator/demodulator 81 then transmits the analog signals using a first communication channel 83 to a second display 87 where a second viewer 85 can view the captured image(s).
Similarly, second image capture device 89, which can be a digital camera, captures an image of second viewer 85. The captured image data is sent to a second D/A converter 93 to be converted to analog signals but can be first stored in a second still image memory 91 for retrieval. The analog signals of the captured image(s) are sent to a second modulator/demodulator 95 and transmitted through a second communication channel 97 to the first display 73 for viewing by first viewer 71.
Although such systems have been produced and used for teleconferencing and other two-way communications applications, there are some significant practical drawbacks that have limited their effectiveness and widespread acceptance. Expanding the usability and quality of such systems has been the focus of much recent research, with a number of proposed solutions directed to more closely mimicking real-life interaction and thereby creating a form of interactive virtual reality. A number of these improvements have focused on communication bandwidth, user interface control, and the intelligence of the image capture and display component(s)s of such a system. Other improvements seek to integrate the capture device and display to improve the virtual reality environment.
There have been a number of solutions proposed for addressing the problem of poor eye contact that is characteristic of many existing solutions. With conventional systems that follow the pattern of FIG. 1, poor eye contact results from locating the video camera on a different optical axis than the video monitor and causes the eyes of an observed participant to appear averted, which is undesirable for a video communication system. Traditional solutions for addressing this problem, employing a display, camera, beam splitter, and screen, are described in a number of patents, including U.S. Pat. No. 4,928,301 entitled “Teleconferencing Terminal With Camera Behind Display Screen” to Smoot; U.S. Pat. No. 5,639,151 entitled “Pass-Through Reflective Projection Display” and U.S. Pat. No. 5,777,665 entitled “Image Blocking Teleconferencing Eye Contact Terminal” to McNelley, et al.; and U.S. Pat. No. 5,194,955 entitled “Video Telephone” to Yoneta et al., for example. Alternately, commonly assigned U.S. Patent Application Publication No. 2005/0024489 entitled, “Image Capture And Display Device” by Fredlund et al. describes a display device for capturing and displaying images along a common optical axis. The device includes a display panel, having a front side and a back side, capable of being placed in a display state and a transmissive state. An image capture device is provided for capturing an image through the display panel when it is in the transmissive state. An image supply source provides an image to the display panel when it is in the display state. A mechanism is also provided for alternating placing the display panel between the display state and the transmissive state, allowing a first image to be viewed and a second image to be captured of the scene in front of the display at high rates such that alternating between the display state and the transmissive state is substantially imperceptible to a user.
Commonly assigned U.S. Pat. No. 7,042,486 entitled, “Image Capture And Display Device” to Manico et al. describes an image capture and display device that includes an electronic motion image camera for capturing the image of a subject located in front of the image display device and a digital projector for projecting the captured image. An optical element provides a common optical axis electronic camera and a light valve projection screen electronically switchable between a transparent state and a frosted state located with respect to the common optical axis for allowing the electronic camera to capture the image of the subject through the projection screen when in the transparent state and for displaying the captured image on the projection screen when in the frosted state. A controller, connected to the electronic camera, the digital projector, and the light valve projection screen, alternately places the projection screen in the transparent state allowing the electronic camera to capture an image and in the frosted state allowing the digital projector to display the captured image on the projection screen. This system relies on switching the entire display device rapidly between a transparent and a frosted state. However, with many types of conventional imaging component(s)s, this can induce image flicker and result in reduced display brightness. Furthermore, the single camera used cannot adjust capture conditions such as field of view or zoom in response to changes in scene.
Although such solutions using partially silvered mirrors and beam splitters have been implemented, their utility is constrained for a number of reasons. Solutions without a common optical axis provide an averted gaze of the participant that detracts from the conversational experience. Partially silvered mirrors and beam splitters are bulky particularly in the depth direction. Alternately transparent or semi-transparent projection display screens can be difficult to construct and with rapid alternation between states, ambient contrast can suffer and flicker can be perceptible. As a number of these solutions show, this general approach can result in a relatively bulky apparatus that has a limited field of view and is, therefore, difficult for the viewer to use comfortably.
As an alternative approach, closer integration of image display and sensing component(s)s has been proposed. For example, U.S. Patent Application Publication No. 2005/0128332, entitled “Display Apparatus With Camera And Communication Apparatus” by Tsuboi describes a portable display with a built-in array of imaging pixels for obtaining an almost full-face image of a person viewing a display. The apparatus described in the Tsuboi '8332 disclosure includes a display element in which display pixels are arranged, along with a number of aperture areas that do not contain display pixels. In its compound imaging arrangement, multiple sensors disposed behind the display panel obtain a plurality of images of the scene through a plurality of clustered lenses that are disposed over aperture areas formed among the display pixels. Each sensor then converts the sensed light photo-electrically to obtain a plurality of tiny images of portions of the scene that are then pieced together to obtain a composite image of the scene. To do this, the display apparatus must include an image-combining section that combines image information from the plurality of images obtained by using the camera.
As another variation of this type of compound imaging approach, U.S. Patent Application Publication No. 2006/0007222, entitled “Integrated Sensing Display” by Uy discloses a display that includes display elements integrated with image sensing elements distributed along the display surface. Each sensing pixel may have an associated microlens. As with the solution proposed in the Tsuboi '8332 disclosure, compound imaging would presumably then be used to form an image from the individual pixels of light that are obtained. As a result, similar to the device in the Tsuboi '8332 disclosure, the integrated sensing device described in the Uy '7222 application can both output images (e.g., as a display) and input light from multiple sources that can then be pieced together to form image data, thereby forming a low-resolution camera device.
Similarly, U.S. Patent Application Publication No. 2004/0140973, entitled “System And Method Of A Video Capture Monitor Concurrently Displaying And Capturing Video Images” by Zanaty describes an apparatus and method for compound imaging in a video capture monitor that uses a four-part pixel structure having both emissive and sensing component(s)s. Three individual emissive pixel elements display the various Red, Green, and Blue (RGB) color component(s)s of an image for display of information on the video-capture monitor. Additionally, as part of the same pixel architecture, a fourth pixel element, a sensing element, captures a portion of an image as part of a photo-electronic array on the video capture monitor. Although this application describes pixel combinations for providing both image capture and display, however, the difficulty in obtaining image quality with this type of a solution is significant and is not addressed in the Zanaty '0973 disclosure. As an example of just one problem with this arrangement, the image capture pixels in the array are not provided with optics capable of responding to changes in the scene such as movement.
The compound imaging type of solution, such as proposed in the examples of the Tsuboi '8332, Uy '7222, and Zanaty '0973 disclosures, is highly constrained for imaging and generally falls short of what is needed for image quality for the captured image. Field of view and overall imaging performance (particularly resolution) are considerably compromised in these approaches. The optical and computational task of piecing together a continuous image from numerous tiny images, each of which may exhibit considerable distortion, is daunting, requiring highly complex and costly control circuitry. In addition, imaging techniques using an array of imaging devices pointed in essentially the same direction tend to produce a series of images that are very similar in content so that it is not possible to significantly improve the overall image quality over that of one of the tiny images. Fabrication challenges, for forming multi-function pixels or intermingling image capture devices with display elements, are also considerable, indicating a likelihood of low yields, reduced resolution, reduced component(s) lifetimes, and high manufacturing costs.
A number of other attempts to provide suitable optics for two-way display and image capture communication have employed pinhole camera component(s)s. For example, U.S. Pat. No. 6,888,562 entitled, “Integral Eye-Path Alignment On Telephony And Computer Video Devices Using A Pinhole Image Sensing Device” to Rambo et al., describes a two-way visual communication device and methods for operating such a device. The device includes a visual display device and one or more pinhole imaging devices positioned within the active display area of the visual display. An image processor can be used to analyze the displayed image and to select the output signal from one of the pinhole imaging devices. The image processor can also modify the displayed image in order to optimize the degree of eye contact as perceived by the far-end party.
In a similar type of pinhole camera imaging arrangement, U.S. Pat. No. 6,454,414 entitled “Device For Image Output And Input” to Ting describes an input/output device including a semi-transparent display and an image capture device. To be semi-transparent, the display device includes a plurality of transparent holes. As yet another example, U.S. Pat. No. 7,034,866 entitled “Image-Sensing Display Device With Particular Lens And Sensor Arrangement” to Colmenarez et al. describes an in-plane array of display elements alternating with pin-hole apertures for providing light to a camera.
The pinhole camera type of solution, as exemplified in the Rambo et al. '562, Ting '414, and Colmenarez et al. '866 disclosures suffer from other deficiencies. Images captured through a pinhole reflect low brightness levels and high noise levels due to low light transmission through the pinhole. Undesirable “screen door” imaging anomalies can also occur with these approaches. Display performance and brightness are also degraded due to the pinhole areas producing dark spots on the display. Overall, pinhole camera solutions inherently compromise both display image quality and capture image quality.
As just indicated, a structure of integrated capture pixels intermingled in a display may cause artifacts for either the image capture system or the image display performance. To some extent, the capture pixel structures can be thought of as defective pixels, which might be corrected or compensated for by appropriate methods or structure. As an example, European Patent Application EP1536399, entitled “Method and device for visual masking of defects in matrix displays by using characteristics of the human vision system” to Kimpe, describes a method for reducing the visual impact of defects present in a matrix display using a plurality of display elements and by providing a representation of a human vision system. The Kimpe EP1536399 disclosure describes at least one defect present in the display deriving drive signals for at least some of the plurality of non-defective display elements in accordance with the representation of the human vision system, characterizing the at least one defect, to reduce an expected response of the human vision system to the defect, and then driving at least some of the plurality of non-defective display elements with the derived drive signals. However, a display having an occasional isolated defective pixel is a different entity than a display having a deliberate sub-structure of intermingled capture aperture or pixels. Thus the corrective measures to enhance display image quality can be significantly different.
One difficulty with a number of conventional solutions relates to an inability to compensate for observer motion and changes in the field of view. Among approaches to this problem have been relatively complex systems for generating composite simulated images, such as that described in U.S. Patent Application Publication No. 2004/0196360 entitled “Method And Apparatus Maintaining Eye Contact In Video Delivery Systems Using View Morphing” by Hillis et al. Another approach to this problem is proposed in U.S. Pat. No. 6,771,303 entitled “Video-Teleconferencing System With Eye-Gaze Correction” to Zhang et al. that performs image synthesis using head tracking and multiple cameras for each teleconference participant. However, such approaches side-step the imaging problem for integrated display and image capture devices by attempting to substitute synthesized image content for true real-time imaging and thus do not meet the need for providing real-life interaction needed for more effective video-conferencing and communication.
The proliferation of solutions proposed for improved teleconferencing and other two-way video communication shows how complex the problem is and indicates that significant problems remain. Thus, it is apparent that there is a need for a combined image capture and display apparatus that would allow natural two-way communication, provide good viewer eye contact, adapt to different fields of view and changes in scene content, provide good quality capture images with reduced artifacts, and provide a sufficiently bright display without noticeable defects in the displayed image.