1. Field of the Invention
The present invention relates to the transmission of pictures over conventional narrow frequency band telephone lines.
2. Related Art
It has long been a goal to transmit pictures along with voice from one person to another. The ability to see the other person while conversing would be a great advance in business and personal communications.
However, the conventional telephone line has a narrow bandwidth of less than 4 thousand Hertz. The present invention uses only the 3000 Hz band from 500 Hz to 3500 Hz. Consequently, it has not been possible to transmit real-time moving TV pictures over the telephone line.
A number of devices, articles and patents have suggested various approaches to the transmission of pictures. One approach is to "slow-scan" an image field and, in effect, transmit still pictures. These devices are sometimes called "video telephones"; but they have not met widespread consumer acceptance, possibly because people are familiar with TV images and movies and expect images to have movement.
Slow scan video systems transmit video images over analog voice-grade telephone lines. This fax-like technology sends a series of still video frames, and generally each picture is transmitted like a TV image in raster format. The exact frame size and frame rate vary with the system, but one inexpensive slow-scan system sends one frame every 9 to 12 seconds. A frame consists of 200.times.242 pixels, each having 6 bits of resolution, so that each picture contains about 350000 bits.
The data rate is 32000 bits/sec. and uses pulse-width modulation to transmit about 4800 pixels/second.
Another approach to the transmission of visual images is to use broad band transmission lines instead of conventional telephone lines. That approach requires a special broad band transmission line from one receiver to another. That type of equipment, because of its expense has also found limited consumer acceptance and is presently used mainly for business teleconferences. An early attempt to introduce wide-band real-time image transmission was the "Picturephone" (TM A.T & T). The Picturephone standard was based on a 1 megahertz bandwidth channel for each signal (2 such channels are needed to transmit and receive). Picturephone images were rectangular, having 250.times.211 pixels (52,750 pixels per image), and were displayed at 30 frames per second. The most optimistic projections about the expected future lower costs of band-width predicted that Picturephone service would cost 10 times that of the voice telephone, but based on the ratio of bandwidths alone, it is 300 to 400 times the cost.
The video telephone is now used mainly for corporate video teleconferencing.
Advances in digital image compression standardization have led to the development of video telephone systems compatible with ISDNs (Integrated Services Digital Networks). ISDN may not be fully implemented in the United States or in other countries for another 10 or 20 years.
But some companies are developing ISDN picture telephone systems. One example is the AEG Olympia "Mike". This system consists of a video monitor and camera, a motion estimating video codec, and an ISDN telephone. The images are in color, the sound is synchronized, and the images are based on a nominal TV image format using 1.5M bytes/sec. Compaction is acheived by parallel execution of a discrete cosine transform and motion estimation algorithm on image sub-blocks. The codec consists of a total of 12 ADSP-2101 processors.
The CCITT H.261 proposed standard, sometimes called p.times.64, defines a method for visual telephone communication. The expression p.times.64 denotes the fact that the channel data rate is an integer 1.ltoreq.p.ltoreq.32 times 64K bits per second. The special case of p=1 is the defacto standard for a low-end video phone using a 64K bit per second ISDN line. The case p=32 is a high-end standard for video teleconferencing.
JPEG (Joint Photographic Experts Group) is a committee of CCITT/ISO. Another committee is MPEG (Motion Picture Experts Group). JPEG is defining 3 levels of the JPEG standard. JPEG is a combination of the discrete cosine transform and run-length coding.
Standardization efforts such as JPEG have nearly matured into detailed specifications for video telephone systems. However, they are not ideal for the transmission of a "log map" image. The logmap image matches the geometry of the human eye. Near the center of the image, the pixels are small, but they increase in size with distance from the center. The image, like the human eye, has a high resolution central area and diminishing resolution toward the periphery. In the human eye the decreasing resolution (and increasing pixel size) away from the center is known to follow a log function, hence the term "log map image". Other terms for this type of image include "foveated", "retinal" or "log polar". The log map image has the same maximum resolution and field-of-view as a conventional TV image, but it contains far fewer pixels. For example in our telephone transmission system the logmap contains about 1400 pixels, compared with about 250,000 pixels in a conventional TV image. Hence, a logmap image is not a "normal" TV image because it is not a rectangular array of pixels. The JPEG standard essentially assumes that the image contains large, homogeneous regions, but the logmaps are small and high-variance. Also, transmission of JPEG images requires an extremely low noise digital channel, but voice-quality telephone lines are not low noise.
We attempted to apply publicly available JPEG image compression software to our logmap images. The JPEG algorithm is not suited to our logmap images, however, because the JPEG algorithm is defined for TV images. TV images are large rectangular arrays of rectangular pixels, and the JPEG algorithm divides these arrays in 8/times 8 blocks for compression. The bow-tie shape of the logmap image (see FIG. 1A) makes it unsuitable for this type of subdivision. The JPEG compression algorithm is one example of a family of compression algorithms that all work by subdividing the image in square blocks. These algorithms assume that the image is large, rectangular, and that many of the blocks are either low-variance or contain highly correlated data. None of these assumptions is valid for logmaps, which tend to be small, non-rectangular, and high-variance.
The high-technology modems presently commercially available can transmit error-free digital information at up to about 18000 bits per second, although error-correcting codes sent along with the data may reduce the transmission rate to around 14500 bits per second. One of the key features of a modem is the very high probability that the data will be received error free.
The Telebit modem is an example of a high speed digital modem. It uses an extension of QAM (quadrature amplitude modulation) to a multicarrier modulation method called DAMQAM, (dynamic adaptive multicarrier QAM). DAMQAM divides the voice band into 511 channels, each of which transmits two to six bits about every 1/10 second. The "dynamic adaptive" part of DAMQAM is the modem's method of selecting which subset of the 511 channels to use and how many bits to assign to each, which is based on measured error characteristics of each channel. The "Telebit" claims a maximum data rate of 18031 bits per second on a local telephone connection, which is reduced to 14400 bits per second by the overhead of an error-checking protocol. The bit rate is reduced further as the noise in the phone connection increases.
The key point here is that in the present invention the frame rate is always constant, but if a digital modem were used then the frame rate would depend on the signal-to-noise ratio (SNR). The higher the SNR over a given bandwidth, the more bits (and hence pixels) per second that can be transmitted through a digital modem. The present system attempts to send a constant frame rate. The penalty that is paid is that some pixels may be noisy, i.e. a pixel's gray level at the receiver does not necessarily match its gray level at the transmitter. But because the signal is an image, the person at the receiver can tolerate a small amount of noise (variation of gray levels) and the image is still recognizable. In other words, as to the amount of acceptable noise the requirements of image transmission are not as strict as digital data transmission.