The concept of the video-telephone has long been anticipated, including in the serialized novel “Tom Swift and His Photo Telephone” (1914). The first working videophone system was exhibited by Bell Labs at the 1964 New York World's Fair. AT&T subsequently commercialized this system in various forms, under the Picturephone brand name. However, the Picturephone had very limited commercial success. Technical issues, including low resolution, lack of color imaging, and poor audio-to-video synchronization affected the performance and limited the appeal. Additionally, the Picturephone imaged a very restricted field of view, basically amounting to a portrait format image of a participant. This can be better understood from U.S. Pat. No. 3,495,908, by W. Rea, which describes a means for aligning a user within the limited capture field of view of the Picturephone camera. Thus, the images were captured with little or no background information, resulting in a loss of context. Moreover, the Picturephone's only accommodation to maintaining the user's privacy was the option of turning off the video transmission.
In the modern world, two-way video communications are now enabled by various technologies. As a first example, cellular phones, including phone-cameras, are widely used. While currently many cell phones include cameras for capturing still images, most cell phones still lack live video capture and display capability. However, companies such as Fotonation Ltd. (Ireland) are enabling new technologies for live video phone-cameras, such as face detection and recognition, as well as face tracking, which could enhance the user experience. As an example, U.S. Patent Publication 2005/0041840 by J. H. Lo describes a camera phone with face recognition capability. While phone-cameras are easy to use, highly mobile, and have arguably become essential for modern life, the size and cost structure constraints limit their applicability for some uses.
Another realization of a device with these general capabilities is the “web-cam”, where a computer, such as a lap-top unit, is equipped with a camera that often has pan, tilt, and zoom capabilities. Companies such as Creative Laboratories (Singapore) and Logitech (Switzerland) presently offer enhanced cameras as computer accessories for web-camera use. These web-cameras can have enhanced audio-capture capability, movement detection, face tracking, and other value-adding features. As an example, U.S. Patent Publication 2006/0075448 by McAlpine et al., describes a system and method for mechanically panning, tilting, and/or zooming a webcam to track a user's face.
Apple Inc. (Cupertino, Calif., U.S.A.) has further extended the web-camera, with the “iSight” and “iChat” products, where the camera is integrated into a lap-top computer, and onboard image processing automatically adjusts the white balance, sharpness, color, focus and exposure and filters out noise to ensure that the transmitted picture provides bright, focused, and true-color imagery. The “iChat” function enables one-to-one chat, multi-way chat, or audio chat with up to ten people. While these video-camera-computer systems are enabling internet-based video-telephony, these technologies have not become ubiquitous like the cell phone has. Certainly, the differential increased cost and size are reasons for this. However, there are many issues related to the user experience with the web-camera that have not yet been adequately addressed. In particular, these systems are not fully optimized for easy use in dynamic environments, such as the home. To accomplish this, technology improvements around the user interface, image-capture, and privacy factors may be needed.
Notably, WebEx Communications (Santa Clara, Calif., U.S.A.) has adapted web-camera technology for the purpose of providing inexpensive web-based video-conferencing for conducting meetings, training sessions, webinars, for providing customer support, and for other business purposes. WebEx delivers applications over a private web-based global network purpose-built for real-time communications. Security is provided on multiple levels, to control attendee access and privileges, the ability to save or print documents, and to provide desktop privacy. Network security features include authentication, meeting and document encryption, intrusion control, and non-persistent data (data not stored on WebEx servers). An exemplary patent, U.S. Pat. No. 6,901,448, by Zhu et al., describes methods for secure communications system for collaborative computing. However, the WebEx approach, while useful, does not anticipate the concerns people have when communicating by video on a personal basis.
As another alternative to the phone-camera or the web-cam, a video-phone having a larger screen, a more functional camera with zoom and tracking capability, enhanced audio, and multi-user capability, could provide an enhanced user experience. Such enhanced video-phone devices could be used in the home, office, or school environments, where mobility can be compromised for improved capture and display capabilities. Most simply, such a system could combine a camera and a television, and use a phone or Internet connection to transfer information from one location to another. U.S. Patent Publication 2005/0146598 by AbbiEzzi et al., describes a basic home teleconferencing system with this construction. This system indeed contains the basic image capture and display elements for a residential teleconferencing system. Like the web-cameras, the system can capture and display a large field of view, which improves on the contextual capture over the original Picturephone. However, there are many aspects of residential video-telephony, relative to managing privacy and personal context in a dynamic residential environment that this system does not anticipate.
A system described in U.S. Pat. No. 6,275,258 by N. Chim provides an enhanced teleconferencing system, which may have residential use, wherein multiple microphones are used to enable enhanced subject tracking using audio signals. The Chim '258 system also improves the eye contact aspects of the user experience, by locating the camera behind the display. In particular, Chim '258 has an enhanced tracking process, which employs multiple microphones to localize and track individuals in their local environment. An audio processor derives an audio tracking signal, which is used to drive a camera to follow an individual. The field of view captured by the camera can be optimized, by both mechanical movement (pan, tilt, and zoom) and image cropping, to follow and frame an individual in their environment. The camera may be hidden behind the display, to improve the perception of eye contact, by capturing direct-on (to the screen) images of the local individuals for display to the remote viewers. While Chim '258 suggests that this system might be used in a residential environment, in most respects, the system is really targeted for the corporate conference room environment, as the privacy and context management aspects are underdeveloped.
As another approach to video communications, enhanced video-telephony has been realized by video-conferencing equipment, which is largely targeted for the corporate market. As an example, companies such as Cisco Systems (San Jose, Calif., U.S.A.); Digital Video Enterprises (Irvine, Calif., U.S.A.); Destiny Conferencing (Dayton, Ohio, U.S.A.); and Teleris (London, United Kingdom), are offering enhanced video-teleconferencing equipment targeted for use by corporate executives. Exemplary teleconferencing prior art patents associated with some of the above companies include U.S. Pat. Nos. 5,572,248 and 6,160,573 both by Allen et al., and U.S. Pat. Nos. 6,243,130 and 6,710,797, both by McNelley et al. The product offerings of these companies emphasize image and sound fidelity, environmental aesthetics and ergonomics, eye contact image capture and display, and the seamless and secure handling of large data streams through networks. For example, improved eye contact is typically achieved by hiding a camera behind a screen or beam splitter, through which it unobtrusively peers.
Although video-conferencing systems are designed to handle multiple participants from multiple locations, the systems are optimized for use in highly controlled environments, rather than the highly variable environments typical to personal residences or schools. In particular, these systems assume or provide standard conference rooms with a central table, or more elaborate rooms, with congress-like seating. As image capture occurs in structured environments with known participants behaving in relatively formalized ways, these conference systems are not enabled with capabilities that could be desired in the dynamic personal environments. These systems can also be equipped to extract the images of the local participants from their contextual backgrounds, so that when the image of that participant is seen remotely, the image appears contextually in the remote environment or in a stylized virtual environment. As with the WebEx technologies, privacy and security are considered relative to the access and transferal of data across a network. As an example, the patent application U.S. 2004/0150712 by Le Pennec, describes an approach for establishing secure videoconferences between multiple nodes, which uses at least three encryption devices, including link-unique encryption keys, a secure interface connecting the encryption keys, and a secure data archive to hold the link-unique encryption keys. Additionally, the cost of teleconferencing systems is often in excess of $100,000, which is not supportable by the residential market.
It is noted that some enhanced teleconferencing systems, which are adaptive to multi-person conversational dynamics, have been anticipated. In particular, a series of patents, including U.S. Pat. No. 6,894,714 by Gutta et al., and U.S. Pat. Nos. 6,611,281 and 6,850,265, both by Strubbe et al., which are all assigned to Philips Electronics (Eindhoven, Netherlands), suggest methods for teleconferencing under dynamic circumstances. As a first example, the Strubbe et al. '281 patent proposes a video-conferencing system having a video locator and an audio locator whose output is used to determine the presence of all participants. In operation, the system focuses on a person who is speaking and conveys a close-up view of that person based on the video and audio locator outputs. Thereafter, if the person speaking continues to speak or becomes silent for a predetermined time period, the system operates to adjust the camera setting to display other participants in sequence who are not speaking, or it zooms out the camera by a specified amount to include all participants. The system is also configured to capture a new person entering or an existing participant exiting the video-conference session. The videoconference scenario of FIG. 2 of the Strubbe et al. '281 patent, which depicts a conference room like setting with participants sitting around a table, does seem particularly suited to handling a formal or semi-formal corporate meeting event, where the various participants are of relatively equal status, and certain amount of decorum or etiquette can be expected. In such circumstances, the formalism of capturing and transmitting the non-speaking participants in sequence could be applicable and appropriate.
The Strubbe et al. '265 and Gutta '714 patents basically expand upon the concepts of the Strubbe et al. '281 patent, by providing adaptive means to make a videoconferencing event more natural. In the Strubbe et al. '265 patent, the system applies a set of heuristic rules to the functionality provided by the camera, the audio locator, and the video locator. These heuristic rules attempt to determine whether the system should follow a current speaker or a switch to a new speaker. Various factors, such as time gaps between speakers, and 5-degree co-location thresholds are measured and assessed against confidence level estimations to determine whether the system should switch to another individual or switch to wide field of view image capture. The Gutta '714 patent extends the concepts of dynamic videoconferencing further, as it identifies a series of behavioral cues from the participants, and analyzes these cues to predict, and then pro-actively make a seamless transition in shifting the video-capture from a first speaker to a second speaker. These behavioral cues include acoustic cues (such as intonation patterns, pitch and loudness), visual cues (such as gaze, facial pose, body postures, hand gestures and facial expressions), or combinations of the foregoing, which are typically associated with an event. As depicted in the respective FIG. 1 of each patent, these patents basically anticipate enhanced video-conferencing appropriate for the conference room or for congress-like seating arrangements, where there is little movement or change of the participants. These patents also seem particularly suited to handling a formal or semi-formal corporate meeting event, where the various participants are of relatively equal status, and certain amount of decorum or etiquette can be expected. Although the Gutta '714 suggests broader applicability, and modestly anticipates (see Col. 11 table) a situation with a child present, the systems proposed in the Strubbe et al. '281, Strubbe et al. '265, and Gutta '714 patents are not targeted to the residential environment. Thus, they are not sufficiently adaptive to residential dynamics, and the privacy and context management aspects are underdeveloped.
U.S. Patent PublicationNo. 2004/0257431, by Girish et al., entitled “Video Conferencing Apparatus and Method”, describes a video conferencing system that has a few features to enable a user to preserve their privacy. In particular, the Girish et al. '431 disclosure provides a hard-wired indicator light to signal that video capture and audio capture are enabled. Girish et al. '431 also provides an audio mute control and a mechanical iris (with an iris cap) in front of the camera to provide further visual confirmation that the video capture is disabled. Girish et al. '431 is particularly concerned with the potential circumstance of an inadvertent video transmission during a video communication event, in which a network link is established and image transmission is occurring without the local users knowledge. However, the Girish et al. '431 approach does not provide a sufficiently versatile approach for a user control the privacy of their environment, or for themselves or others (such as family members). This system also lacks contextually interpretive controls and features that would be useful in a residential environment.
Teleconferencing or enhanced video communications has also been explored for the office and laboratory environments, as well as the conference room environment, to aid collaboration between colleagues. The first such example, the “media space”, which was developed in the 1980's at the Xerox Palo Alto Research Center, Palo Alto, Calif., U.S.A., provided office-to-office, always-on, real-time audio and video connections. As a related example, the “VideoWindow”, described in “The VideoWindow System in Informal Communications”, by Robert S. Fish, Robert E. Kraut, and Barbara L. Chalfonte, in the Proceedings of the 1990 ACM conference on Computer-Supported Cooperative Work, provided full duplex teleconferencing with a large screen, in an attempt to encourage informal collaborative communications among professional colleagues. Although such systems enabled informal communications as compared to the conference room setting, these systems were developed for work use, rather than personal use in the residential environment, and thus do not anticipate residential concerns.
Prototype home media spaces, for facilitating communications between a telecommuter and in-office colleagues have also been developed. For example, an always-on home media space is described in “The Design of a Context-Aware Home Media Space for Balancing Privacy and Awareness”, by Carman Neustaedter and Saul Greenberg, in the Proceedings of the Fifth International Conference on Ubiquitous Computing (2003). The authors recognize that personal privacy concerns are much more problematic for home users than for office based media spaces. As the paper discusses, privacy encroaching circumstances can arise when home users forget that the system is on, or other individuals unwarily wander into the field of view. The described system reduces these risks using a variety of methods, including secluded home office locations, people counting, physical controls and gesture recognition, and visual and audio feedback mechanisms. However, while this system is located in the home, it is not intended for personal communications by the residents. As such, it does not represent a residential communication system that can adapt to the personal activities of one or more individuals, while aiding these individuals in maintaining their privacy.
Thus, there is a remaining need and opportunity, which is not anticipated in the prior art, for residentially targeted system that is generally useful for aiding family video-conferencing or video communications with one or more remote individuals. Such a system should function as seamlessly as is reasonably possible while being adaptable to the dynamic situations present in a residence. In particular, the system should enable the users to readily manage and maintain their privacy, relative at least to image capture, recording, and transmission. This system should also manage the contextual information of the user and their environments, to provide an effective communication experience.