1. Field of the Invention
This invention relates to the field of digital data transmission, and especially to video data transmission in which bandwidth requirements are reduced by reducing the number of bits used to encode the signal, while retaining accuracy and resolution of the digital data.
2. Prior Art
Various techniques are known for digitizing data and using predictor circuits to avoid the necessity of transmitting data bits if the same information can be transmitted using some other technique. Typically, some spatial or temporal pattern or redundancy is detected in the data and encoded for transmission in lieu of the data itself. An example of such a system is disclosed in U.S. Pat. No. 4,402,010 to Vogelman. A system of run length coding for video transmission is provided with means to determine when a scan line in a digitized video signal is equal to a previously-transmitted line, in which event the line is not re-transmitted. Instead, a sync signal is transmitted to advance the line count, the receiver merely repeating the already-received line data. Similarly, when successive digitized samples of a frame have equal intensity or the like, a unique code is transmitted to so indicate.
According to the Vogelman technique, bits which are redundant in successive pixels or successive lines are not transmitted, and instead a lesser number of bits are sent to indicate the redundancy. Theoretically, the Vogelman system will result in a substantial savings in the number of bits to be transmitted, and a consequent reduction in the bandwidth required to transmit the data within a given time, e.g., the video frame scanning period. The decrease in the amount of data to be transmitted as a result of such compression is most remarkable in connection with transmission, for example, of facsimiles and the like, in which there is usually a large portion in the frame to be transmitted that is of substantially-equal intensity. Such unoccupied areas are likely to be a large part of documents containing text, wherein line spacings and margins are blank.
In connection with video image transmission other than facsimile transmission, Vogelman's technique of detecting redundancy in successive portions of the data is less effective. In the case of teleconferencing and general video transmission such as for entertainment, security, etc., the subjects shown in the video image are likely to be characterized by more complicated variation in color and intensity than is characteristic of facsimile transmission. Backgrounds are likely to be unevenly lighted, textured, or otherwise characterized by subtle gradations in chrominance (color) and/or luminance (intensity). If the video signal is to be encoded and transmitted at video scanning speed with good resolution and tight bandwidth, then it is not sufficient to merely detect redundancy and indicate that a portion of data should be a repeated from a previously-transmitted portion. Too little of the data is actually redundant from pixel to pixel and line to line.
Systems are possible along the lines of Vogelman's system where temporal redundancy between successive frames is used for data compression. Unfortunately, whether the comparison is pixel and pixel, line to line or frame to frame, the variation in data values for general purpose video is such that for practical purposes the actual realized compression is minimal and may at times even be negative.
A system is also known and used in Japan, known as the MUSE system, in which a reduced sampling rate is used to reduce bandwidth requirements. In this system, a sampling technique is undertaken with each successive displayed video frame comprising three-fourths pixels repeated from the previous frame and one-fourth newly-transmitted data. The sampling pattern changes from frame to frame, until after four frames the pixels are all brought up to date. Sampling according to such a system is characterized by a substantial savings in bandwidth because only a portion of the total frame to be displayed is transmitted at any time. The system has difficulties with motion. When used for teleconferencing video, the MUSE system is adequate for viewing still items, but when a subject moves in the field or if the video camera is to be panned, the subject is blurred or may even disappear until coming again to a stable position.
There are known systems in which only a sample of data points for the frame is transmitted as in the MUSE and an attempt is also made to predict the values of other data points based upon the values of the sampled points. Reference can be made, for example, to U.S. Pat. Nos. 4,193,092 to Stoffel and 2,905,756 to Graham, which predict interleaved pixel values by interpolating from other pixel values. The Graham patent teaches choosing among alternative predictors by subtracting the predicted values from the actual values, to develop a prediction error signal, which can be compared against other predictors. Examples of other techniques with one or more predictors may be found in U.S. Pat. Nos. 4,477,915 to Peters, 4,411,001 to Van Buul et al, 4,200,886 to Musman et al and 4,292,651 to Kretz et al.
The bandwidth of a transmitted signal, whether it be a television video signal or a facsimile signal, is related to the amount of information to be transferred and the transmission time, normally the scanning rate. In television transmission, the scanning rate is uniform, and the bandwidth required for the system is that needed to transmit a signal characterized by the fastest possible rate of change in intensity while retaining satisfactory picture quality. Since the bandwidth is dictated by the worst case greatest rate of change, the bandwidth devoted to the channel is wasted during transmission of any smaller rate of change.
The system can also be considered based upon the amount of digital data required to encode data to the required resolution. Typically, the changes in intensity from one point to the next in a video signal will be minimal. Nevertheless, in order to retain the possibility of transmitting and receiving the minimum and maximum intensities in immediately-adjacent digitized picture elements, it becomes necessary to encode the intensity to any point within the overall range. Unless a full scale change in intensity is experienced, the span or resolution to which the intensity may be encoded is wasted.
The present invention provides a way in which the rate of change from pixel to pixel in a sampled video signal is reduced from actual conditions by means of a localized averaging. Instead of predicting and/or encoding the differences between pixels according to temporal or spatial redundance, the invention encodes differences from the averaged artifically-smoothed succession of pixels. This smoothed video signal is employed in addition to a transformed signal of the original image. The transformed signal represents the energy of the pixels in ane picture frame. Both the difference data as well as the transformed energy data are encoded using techniques to minimize word length, to realize a system characterized by full resolution and accuracy but a minimum of bandwidth. The system of the invention makes maximum use of the allotted bandwidth, with little or not deterioration in the transmission quality. The raw signal in its input form is substantially-completely recovered in the output at the receiver.