1. Field of Invention
The present invention relates to videophone systems, specifically a videophone device which can transmit and receive still video images over a standard telephone line in response to a user command.
2. Prior Art
Videophones, which have been available in the consumer market for at least 20 years, are considered relatively complex and expensive systems. A videophone allows the transmission of video images over a standard telephone line. A fundamental problem associated with such transmission is the excessive frequency bandwidth of the video signal, as compared to the bandwidth of the telephone line.
The standard two-wire telephone-set connection, referred to as the public switched telephone network (PSTN), is designed to exchange audio information, specifically voice, between two or more users. Ideally, two users would communicate speech using a channel having a bandwidth equal to that of audible sound, nominally 20 Hz to 20 kHz. (This is the maximum frequency response of the human ear and high-fidelity audio systems are designed to provide such a response.) However, due to the large number of PSTN subscribers and economic considerations, the actual bandwidth of a telephone channel has been reduced to only 3.0 kHz (300 Hz to 3.3 kHz). This narrow bandwidth allocation allows more subscribers to simultaneously communicate through the PSTN, with negligible degradation of speech clarity. However, such a bandwidth provides relatively poor sound quality, typically noticed with voice received on a telephone.
The partition of a physical channel in the frequency domain into a multiplicity of channels with limited frequency response is referred to as frequency multiplexing. This method utilizes a physical channel very economically. For example, if a copper cable has a nominal transmission capability (bandwidth or frequency response) of 10 Mhz, then about 3,000 (10 Mhz/3 kHz) channels can be sent simultaneously over this cable, each with a 3 kHz bandwidth. (This principle underlies the present configuration of the PSTN.) Standard NTSC (National Television Systems Conference) video signals, on the other hand, have a bandwidth of approximately 4 Mhz, which enables only two channels to be transmitted over the same cable.
It is clear that video communication requires a relatively wideband channel, and telephone lines were not designed for this purpose. In general, transmitting an arbitrary signal through the standard telephone line is not possible without xe2x80x9csizingxe2x80x9d the signal within the 300 Hz to 3.3 kHz frequency band. One way to accomplish this is to digitize the arbitrary signal (convert it to a stream of binary bits), and then transmit the digital data that result using a modem (modulator/demodulator). Modems can transmit digital data in the form of analog pulses through the essentially analog, band-limited telephone network. At the transmitter end, digital data (a stream of binary bits, 0""s and 1""s) is modulated into analog tones within the restricted bandwidth of the PSTN. At the receiver end, the analog tones are demodulated, the digital data extracted, and the arbitrary signal reconstructed.
The maximum number of digital bits per second or channel capacity (C) that a modem can transmit is limited by the bandwidth (B) and signal-to-noise ratio (SIN) of the physical channel. Shannon-Hartley theorem defines the relationship between C, B, and S/N as: C=B*log2(1+S/N) bits/s (bits per second). For the PSTN, the channel capacity is approximately 40 kb/s (kilobits per second), assuming B=3.0 kHz and S/N=40 dB. If the sampling rate of the digitizing circuit is faster than the rate at which the modem can transmit data through the analog channel, the signal cannot be transmitted in real time. This case applies precisely to transmission of video signals through the telephone network. The Nyquist sampling rate required to digitize a video signal of 4 Mhz bandwidth is 8M samples/s, whereas the highest data rate achievable by most currently manufactured modems is 28.8 kb/s. Assuming that each digital sample contains 8 bits, 64 Mb/s (8Mxc3x978) would be required to transmit the video signal in real time. Even if a sophisticated compression algorithm like JPEG (Joint Photographic Experts Group) were used, which can compress video by a factor of about 20:1, a channel capacity of 3.2 Mb/s would still be required, substantially higher than what standard telephone modems can offer.
To circumvent this difficulty, video signals can be processed before compression and transmission. A standard NTSC video signal consists of 30 video frames per second. If the number of frames per second transmitted is reduced to only 1, for example, the data rate will be reduced by a factor of 30. And if one frame is transmitted every 10 seconds, the data rate required will be reduced by a factor of 300. Generally, the low data rate allowed by the telephone line can be approached by sufficiently decreasing the number of frames per second transmitted. However, decreasing the frame rate degrades the quality of moving video images and precludes the transmission of a full motion picture.
An additional complication in the transmission of digital video data over the telephone line is the simultaneous transmission of voice and video information. If voice must be transmitted in real time, at least 50% of the telephone channel will be used by the audio information. This makes the channel capacity even more restricted for video information.
Modem computers process video signals in three dimensions, red, blue, and green, each color component generally requiring digital samples at least six bits wide. Although this method requires 18 bits (rather than eight) of data per pixel (video image dot), the transmission speed of video images can be increased by reducing the number of pixels transmitted, which however results in loss of resolution.
Numerous digital video storage methods and videophones are known in the art. Horgan, in U.S. Pat. No. 4,857,990 (1989), discloses a relatively complex method of storing a full 525-line, NTSC video frame in a 256 kByte dynamic RAM (DRAM). Each horizontal scan line is digitized at a sampling rate 8/3=2.67 times the subcarrier frequency of an NTSC signal, into a number of samples equal to the number of columns in the memory where the frame is to be stored. The color burst is sampled partially and stored independently, requiring a dedicated circuit to reconstruct the phase reference signal of the color picture from the digital samples of the color burst. Horgan""s scheme is clever, but has serious drawbacks. The digital memory must be organized into 512 rows by 512 columns, or any other combination that matches the number of digital samples per horizontal line and the number of horizontal lines themselves. Therefore, the method is strictly limited to NTSC video signals with 525 lines, and can only use memory arrays generally found in DRAM. Although DRAM is inexpensive, it requires memory refresh controls that can add substantial cost and timing limitations to the system.
Kashigi, in U.S. Pat. No. 4,325,075 (1982), discloses a complicated video storage scheme that also addresses European (PAL and SECAM) color television systems. Kashigi stores video digital samples in a plurality of memory blocks, and requires a complex memory address control circuit to synchronize the selection of horizontal and vertical memory blocks.
Yamamoto, in U. S. Pat. No. 5,452,022 (1995), discloses a digital storage device for a still video apparatus. A video signal is decoded into a luminance signal and two color difference signals. The three signals are digitized by three analog-to-digital (A/D) converters and stored in a digital memory. A control circuit is provided to stop the A/D converting operation during the blanking level, so that the blanking level data can be stored in a buffer memory. The composite video signal can be reconstructed by means of three D/A converters that read the digital video data, and a control circuit that retrieves blanking level data from the buffer memory. Yamamoto""s system is too expensive to be used in large-scale production, as it requires a decoding circuit to generate the luminance and color difference signals, three A/D converters, three D/A converters, and a complex synchronizing digital scheme.
Filo, in U.S. Pat. No. 5,079,627 (1992), discloses a videophone that can transmit a full motion picture of reasonably good quality. However, the image capturing method is non-standard and the system requires the use of either a CRT monitor or a mechanical rotating disk of film to display the incoming images. The nature of this imaging apparatus makes the system incompatible with modem video devices such as cameras and TV displays, and in this sense Filo""s videophone can be considered obsolete.
In 1995, Casio-Phonemate Inc. of Torrance, Calif., introduced a video conferencing system at the Consumer Electronics Show in Las Vegas, Nevada. This system, designated LT-70, costs approximately $1,500 per unit, includes a small, non-standard video camera, and is compatible with only NTSC video. To use the LT-70, a user initiates a telephone call with a distant party, and then establishes communication between two LT-70 units by having both parties simultaneously press start buttons. Normally the users synchronize the press of the start button by counting one, two, and three at the same time, and then pressing the button. After a short handshaking routine, communication between the two LT-70 units is established. The LT-70 now continuously sends and receives a still, low-resolution video image approximately every 3.5 seconds; this image can be displayed on a standard TV monitor. The users can talk to each other during video transmission. High-resolution image transmission can be selected, but it takes about 30 seconds to transmit an entire video image in high resolution mode. Even though this video conferencing system is functionally very sophisticated, it suffers a great deal of disadvantages. The system is expensive and cannot be afforded by the general public. (It is necessary to buy two LT-70 units to video conference, for a total cost of about $3,000). The initial synchronization procedure to start communication between two units is very awkward, as it requires nearly perfect coordination of the two users (a synchronized count). Because still images are grabbed and transmitted automatically every 3.5 seconds, the images do not result in a full motion picture and sometimes display the users in an undesirable position or expression. The quality of voice communication is substantially degraded due to the relatively large amount of data to be transmitted (voice and video) in the restricted bandwidth of the telephone line. Finally, because the system relies on a modem carrier to maintain communication between two users, the telephone connection is susceptible to interruptions due to a carrier loss. A carrier loss may occur as a result of noise on the telephone line, a call-waiting signal from the central office of the telephone company, or the activation of an extension phone connected to the same telephone line in the same household, for example. The probability of losing a connection is directly proportional to the amount of time that the users spend video conferencing, and the connection is likely to be lost in an extended video conferencing session.
A number of video conferencing systems are available today at a retail price of about $200, but they all require an expensive personal computer (PC) to function. To operate these systems, the user has to be xe2x80x9ccomputer literatexe2x80x9d and must follow a relatively complex installation procedure. Establishing a video conferencing session requires substantial effort by the user, and the session relies on continuous modem communication, which can often result in a lost connection during an extended session.
In general, existing video conferencing systems grab and transmit video images dynamically, without letting the user discard images or decide which image to send. The lack of xe2x80x9cprivacyxe2x80x9d associated with the operation of these systems (in addition to their cost and complexity) is probably the reason why such systems have not become popular for home or business use.
Accordingly, several objects and advantages of the present invention are:
(a) to provide an improved video transmission system over a standard telephone line at very low cost;
(b) to provide a video transmission system which can be implemented using standard, off-the-shelf parts, and a simple software code stored in a microcontroller;
(c) to provide a video transmission system which does not require a personal computer (PC) to function and can be operated by the user with no supervision, other than to initiate a video image transmission by pressing a user key;
(d) to provide a video transmission system that minimizes the probability of interrupting the telephone connection between two users;
(e) to provide a video transmission system which the user can operate from a distance using an infrared remote control unit;
(f) to provide a video transmission system that allows the user to choose the image to be transmitted before electing to send it to another party;
(g) to provide a video transmission system which allows retrieval of video images from remote locations for security purposes, using dual-tone-multi-frequency (DTMF) signals;
(h) to provide a digital video storage method compatible with NTSC, PAL, and SECAM video formats;
(i) to provide a digital video storage method which requires a reduced amount of digital memory; and
(j) to provide an efficient method to transmit data representing a video frame, through the telephone line.