Traditionally, telephone and video communication systems have been bifurcated. Conventional telephone systems (or PSTN systems) operate at a bandwidth appropriate for voice communications, and typically provide spontaneous, point-to-point communications, such as two-way voice and data services, between two end users. In contrast, video distribution networks (including cable television systems), operate at a much broader bandwidth than telephone systems, and are usually employed to transmit pre-determined, high quality, full-motion video and audio concurrently to a plurality of subscribers.
It has long been felt that if the best features of voice and video communication systems could be combined appropriately, fully interactive video telephony would become feasible, and accordingly video telephony has been the subject of commercial development for many years. Although the first videophone appeared as early as the 1930s, a commercially viable videophone has yet to be introduced, even though significant efforts have been devoted to developing the same. This has been due, in large part, to the relatively high cost of videophones, their complexity both in design and use, their inability to concurrently provide quality image and sound, and their inability to provide a network infrastructure capable of two-way communications with minimal signal degradation.
Prior attempts at video telephony typically have resembled traditional business telephone desk sets with the addition of a display monitor and a camera, together with associated controls for operating the videophone. The cost of such devices has typically been in excess of $1000, which is above the level of affordability for many users, and this cost is compounded since at least two videophones are needed to make a video call. Furthermore, these devices are often relatively large, and not portable.
The quality of the image and sound in such prior videophones is typically substantially less than what is expected by most people for normal communications. Only a minimal capability, if any, is provided for accommodating different ambient conditions, or different audio characteristic (e.g., canceling ambient noise and feedback within the audio signal, accommodating concurrent conversations by both parties to the call). Furthermore, the signal processing utilized for such devices, including the techniques used for compressing and decompressing the resulting audio and video signals, has not been optimized with the result that the quality of both the transmitted and received video is much less than what is expected from a communications system. For example, varying ambient light conditions often result in over-exposed or under-exposed pictures. Movement of the user often results in both a significant degradation in image quality as well as the possibility that the camera can no longer capture the image of the user (e.g., outside of the limited range of view of the camera).
Because of the complexity of prior systems, there is a complicated set-up process to configure the videophone to the particular communications network being utilized. Even videophones that can work with multiple types of communications networks are far from “plug ‘n’ play” with any network. In addition, the videophone must be located where it can be directly connected to the available communication network via an Ethernet or comparable connection, severely limiting flexibility in locating and using the videophone. Since a videophone typically uses traditional IP addressing, a user must enter a number sequence that is different from what people are accustomed to as a standard phone number. Furthermore, there typically is no provision for telephone services and applications such as caller ID, call waiting, call forwarding, conferencing and the like.
Videophones are expected to work across long distances which encompass multiple networks and network infrastructures. Delays in transmissions and the presence of noise degrade the signal quality. Even though prior videophones have advertised high frame rates and transmission speeds, they do not typically achieve these speeds due to the upstream and downstream characteristics of communications networks or due to lossy networks which cause data to be corrupted or lost during transmission. This results in degraded images and sound quality, jitter, lack of synchronicity between the voice and video, etc.
In prior systems, attempts have been made to overcome degraded images, loss of synchronism between audio and video, jitter and delay through the use of feedback systems which detect errors such as missing data packets and request retransmission by the source. Such error recovery requires buffers for temporary storage of received signals, and produces delays in any communication between videophones. This lack of real time communication is unacceptable in videophone systems.