Multimedia communications is a rapidly developing field. Recent advances in both the computer industry and telecommunications field has made digital video and audio economically viable for visual communications. This progress has been supported by the availability of digital channels such as the narrow band Integrated Services Digital Network (ISDN) and its successor the broadband ISDN and its progress to local area networks (LANs), wide area networks (WANs), digital satellite and wireless networks, digital terrestrial broadcasting channels and corporate intranet networks and will lead to communication-based applications such as video phone, video conference systems, digital broadcast TV/HDTV, remote sensing, and surveillance. Digital storage-based audio visual applications include server-client based data bases, education, training, video-on-demand type entertainment, advertising, and document storage and transfer.
Specific examples of actual and anticipated applications include a web streamer which provides video data streams from stored video clips at a server. Desirably, the video data would be delivered such that there is no need to store the data at the client before displaying. Such an application would provide training on demand, advertising promotions, product demonstrations, product promotions on the internet, music videos and communications between executives of companies and other such uses. Other applications would include sending video data one way for telesales, product support, tourism promotions, road/traffic conditions, security and surveillance, video e-mail and the like. Another exemplary application would be video conferencing such as used for corporate work groups, medical diagnostics and conferencing, distance education and training, customer support and professional conferences. Other contemplated applications of the invention would include further expansion into areas such as live video streamer multicast and multipoint conferencing.
A cost effective digital compression and decompression arrangement for video and audio data streams while delivering high quality video and audio streams is essential for the introduction and widespread use of visual communications. To reduce transmission and storage costs, improved bit rate compression schemes are needed. Image, video, and audio signals are amenable to compression due to considerable statistical redundancy in the signals. Within a single image or a single video frame there exists significant correlation among neighboring samples, giving rise to what is generally termed "spatial correlation." Also in moving images, such as full motion video, there is significant correlation among samples in different segments of time such as successive frames. This correlation is generally referred to as "temporal correlation." However, a difficulty arises in the provision of a digital compression arrangement for a web streamer in that the bandwidth of the transmission channel is subject to change during transmission and clients with varying receiver resources may join or leave the network as well during transmission.
Accordingly, there is a present need in the art for an improved cost effective system and method that uses both spatial and temporal correlation to remove the redundancy in the video to achieve high compression in transmission and to maintain good to excellent image quality while continually adapting to change in the available bandwidth of the transmission channel and to the limitations of the receiving resources of the clients. The purpose of the present invention is then to provide a next-generation cost effective video compression/decompression (CODEC) system for storage and distribution of high quality multimedia information on information networks using personal computers (PCS) that is continuously adaptive to changing conditions.
In reviewing the prior art it is found that a known technique for taking advantage of the limited variation between frames of a television broadcast is known as motion-compensated image coding. In such coding, the current frame is predicted from the previously encoded frame using motion estimation and compensation, and the difference between the actual current frame and the predicted current frame is coded. By coding only the difference, or residual, rather than the image frame itself, it is possible to improve image quality, for the residual tends to have lower amplitude than the image, and can thus be coded with greater accuracy. Motion estimation and compensation are discussed in Lim, J. S. Two-Dimensional Signal and Image Processing, Prentice Hall, pp. 497-507 (1990). A frame of estimated motion vectors is produced by comparing the current and previous frames. Typically, each motion vector is simply a pair of x and y values representing estimates of the horizontal and vertical displacement of the image from one frame to the next at a particular location. The motion vectors are coded as side information. In the decoder, the current image frame is computed by summing the decoded residual with a motion-compensated version of the prior image frame. Motion compensation is typically performed on each pixel of the prior frame using bilinear interpolation between nearest motion vectors.
A review of the patent literature has uncovered some patents that are of interest. U. S. Pat. No. 5,218,435 dated Jun. 8, 1993 and issued to J. S. Lim et al for DIGITAL ADVANCED TELEVISION SYSTEMS teaches image quality improvement in high definition television using multi-scale representation of motion compensated residuals. The bandwidth of the subband filters vary with the frequency band and the total number of coefficients in the multi-scale represented frames is equal to the number of values in the residual. Image initialization in the receivers is achieved using original image leakage, but the leakage factor is varied for different frequency subbands. To free up channel capacity at scene changes, a frame-wide decision is made as to whether to motion compensate a particular frame. Chrominance resolution employs encoding all of the subbands of the chroma residuals, instead of just the low subbands.
U.S. Pat. No. 5,043,808 issued on Aug. 27, 1991 to S. C. Knauer et al for HIGH DEFINITION TELEVISION ARRANGEMENT EMPLOYING MOTION COMPENSATED PREDICTION ERROR SIGNALS teaches a high definition television system where the television signal is encoded by developing motion vectors that describe the best motion estimate of the image to be transmitted, by developing motion estimation error signals, by encoding these error signals within the same bandwidth as occupied by standard NTSC signals and by transmitting the encoded error signals during periods that correspond to the active scan intervals of the NTSC TV signal. The motion vectors themselves, together with video and control signals, are transmitted during the NTSC retrace period.
U.S. Pat. No. 5,043,810 dated Aug. 17, 1991 issued to F. W. P. Vreeswijk et al for a METHOD AND APPARATUS FOR TEMPORALLY AND SPATIALLY PROCESSING A VIDEO SIGNAL. This patent teaches a system having a transmitting section that has transmitting section signal paths for at least three classes of motion, each with a preprocessing circuit which is provided with means for individually sampling in accordance with separate sampling patterns so that each preprocessing circuit supplies a video signal which is suitable for a display with an optimum distribution of temporal and/or spatial resolution for the associated class of motion. Dependent on the class of motion determined, one of the preprocessing circuits is coupled to a channel so that video signal supplied to the channel is suitable for a display with an optimum distribution of temporal and/or spatial resolution for the given class of motion. The associated receiver also has three receiving section signal paths comprising a postprocessing circuit which decodes a received video signal according to a method which selects the correct postprocessing circuit in accordance with the class of motion so that a display with an increasing spatial resolution can be achieved in the case of a decreasing motion.
U.S. Pat. No. 4,943,855 dated Jul. 24, 1990 issued to H. Bheda et al for PROGRESSIVE SUB-BAND IMAGE CODING SYSTEM. This patent teaches reducing image data redundancies through progressive subband coding. The image is separated into a selected plurality of subbands, and the sub-band with the largest non-redundant data content is chosen and used to predict the data in the other subbands. Only prediction error information of the predicted sub-bands is encoded and transmitted together with the encoded chosen sub-band. An overall performance error signal can also be evaluated at the encoder end and used to further improve performance.
U.S. Pat. No. 5,272,529 issued on Dec. 21, 1993 to J. E. Frederikesen for ADAPTIVE HIERARCHICAL SUBBAND VECTOR QUANTIZATION ENCODER. This patent teaches a system for data reduction digital video signals based on vector quantization of vectors formed from coefficients of a discrete cosine transform of pixel blocks. The coefficients are grouped into subbands and both scaler and vector quantization are used. Vector quantization is implemented either directly on the vectors or on vectors formed from inter-frame differences between the transformed vectors. The vector quantization searching routine is in accordance with the Voronoi regions resulting from an off-line codeword clustering method using a minimum distance criterion.
U.S. Pat. No. 4,817,182 dated Mar. 28, 1989 for TRUNCATED SUBBAND CODING IMAGES teaches analyzing image data in a number of iterated analysis procedures, using two-dimensional quadrature mirror filters to separate low-pass spatial filter response component and three differently oriented high-pass spatial filter response components, which filter response components are decimated in both dimensions. The high pass filter response components are coded "as is" and the low-pass filter response component is coded "as is" only in the last iteration. In the earlier analysis procedures the low-pass filter response component provides the input data for the succeeding analysis procedure. No motion compensation is taught.
U.S. Pat. No. 4,969,040 dated Nov. 6, 1990 and issued to H. Gharavi for APPARATUS AND METHOD FOR DIFFERENTIAL SUB-BANDING CODING OF VIDEO SIGNALS teaches an arrangement for achieving a high compression of a video signal. The PEL-by PEL difference between an input signal consisting of digital PEL values of a scanned video signal and a motion compensated interframe prediction signal is decomposed into several narrow bands using separable two-dimensional quadrature mirror filtering. Each sub-band is quantized by a symmetric uniform quantizer with a center dead zone. Entropy coders code the quantized values by variable word-length coding the nonzero quantized values and transmitting that information with the corresponding run-length coded positional information. The outputs of the coders are combined into a constant rate bit stream. As required, the dead zones and step sizes of the quantizers are adjusted to force more zero value quantized levels thereby reducing the amount of data.
In Signal Processing: Image Communication 2 (1990) pg. 81-94, K. S. Thyagarajan and Harry Sanchez discussed the ENCODING OF VIDEOCONFERENCING SIGNALS USING VDPCM. The techniques of motion detection, interframe linear block prediction and vector quantization were incorporated in an arrangement for encoding monochrome image sequences for video conferencing application. Data compression rate reduction is accomplished by identifying and processing only those regions that exhibit noticeable changes between successive frames, by estimating the magnitude of the change through linear block or vector prediction and quantizing the residual vectors through a vector quantizer. The motion detector uses a modified block matching algorithm to detect the moving blocks. Perceptually-based edged detectors are used to design vector quantizer (VQ) codebooks for different classes of image blocks to achieve better visual quality. Encoding rates under 60 kbps are achieved with acceptable visual quality at nominal computational complexity.