The communication of video data over standard communication conduits is challenging. Video data is transmitted in frames at a typical rate of 10 frames per second. Each frame is comprised of three arrays of pixel data which represent the red, green and blue (RGB) color space for a color screen. Typically, the red, green and blue pixel data are represented in luminance/chrominance color space which is expressed in Y, U and V components. The conversion from the RGB color space to the YUV color space is well known and is performed to reduce the number of pixels needed to represent a frame of data. For example, a typical video screen is comprised of 352.times.288 pixels. For the RGB color space, each pixel is represented by three data elements since one data element is used to represent each color component. If a single byte is used to represent each data element then there are three bytes per pixel and a screen requires 304,128 bytes of data to represent the screen. At ten frames per second, 3,041,280 bytes are required for a one second transmission. In the YUV color space, the U and V components may be as small as one-quarter of the Y component. Thus, a sequence of ten video frames may be expressed in YUV color space with as few as 1,520,640 bytes, for example.
While the expression of video data in YUV color space reduces the amount of data needed to represent a video sequence, the amount of data required to transmit a sequence of frames still exceeds the bandwidth of many communication conduits. For example, telephone lines which support digital communications such as ISDN lines only support communications at a 56K bits per second rate. Thus, video data being generated at ten frames a second in the YUV color space generates more data than can be transmitted over an ISDN line. As a result, video data usually requires a communication conduit having a greater bandwidth such as T1 line. However, T1 lines or other lines having sufficient bandwidth for video require special installation and the expense associated with using such lines is significant. Accordingly, the transmission of video data requires expense and facilities not normally available.
In an effort to reduce the amount of data necessary to represent a frame of video data, methods of compressing such data have been developed. Among these methods are Discrete Cosine Transform (DCT), affine map transformation methods, sometimes known as fractal methods, and wavelet compression methods. A compressor implementing one of these methods determines coefficients for a corresponding function which may be transmitted over a communication conduit to a decompressor which uses the coefficients to regenerate a frame of video data. At the simplest level, these methods are used to compress the content of each frame in a sequence in terms of itself. That is, the data content of the frame is compressed to represent the frame. Such compressed frames are sometimes called intraframes, self-referential frames, or I-frames.
Compressors using these methods usually compress frames comprised of YUV color space components because the Y screen component includes most of the detail and edges for the image while the U and V components contain some shading information but not a great deal of image detail. As a consequence, the U and V components are relatively smooth and thus, may be compressed more effectively with fewer resources than any of the components for the RGB color space. While the YUV color space yields better compression ratios than those for RGB color space, compressors using the methods discussed above do not consistently compress video frames within a number of bits that can be transmitted over a relatively narrow bandwidth communication conduit and still retain quality image fidelity.
Another way to represent a frame of data is to express a current frame in terms of the data content of a previous frame. One such method used to represent video frames is motion compensation. A compressor using this method generates motion vectors that describe how to rearrange the data elements of a previously generated frame to generate a current frame. This is normally done by dividing the previously generated frame and current frame into blocks and determining the motion vectors that define how the blocks of the previously generated frame should be moved to best represent the current frame. This method works best for video sequence data in which the content of the scene is relatively stable. For example, a teleconference in which the speakers are seated against a background so that the substantial changes in the content of a frame are the relatively minor facial expressions and body movements of the speakers. However, sequences in which the content of the frames change significantly from one frame to the next are more difficult to represent with this method. This method may also be used with other compression methods as well. For example, the pixel differences between a current and previous frame may be compressed using a DCT method. Efficiencies may be gained by expressing data values as a difference because the data values are smaller. Frames which are expressed as a group of motion vectors which relate a frame to a previous frame are called differential frames. Self-referential compressed frames usually require greater computational time and computer resources for compression and decompression than differential frames.
While the previously known differential methods may be preferred over methods which produce self-referential frames only, there are a number of reasons that compressor/decompressor systems which only generate differential blocks are not implemented for low bit rate communication. For one, compressors using these methods may not be able to find a good match between a current frame and a previous frame for a number of reasons. In order to more quickly locate a block in a previous frame which corresponds to a block in the current frame, the compressor searches the previous frame in a limited range about the pixel address corresponding to the address of a block in the current frame. While this reduces the time required for a search, it may result in a poor match. For example, if an object being imaged moves from one portion of a frame to another which is outside the search range of the compressor for a particular block, then the compressor does not find a good match for the block. The poor correspondence of the current frame block to blocks in the previous frame may be handled by representing the current frame block in a self-referential manner. Such self-referential compressed blocks are called residual blocks. The use of residual blocks require the decompressor to be able to distinguish between compressed differential blocks and compressed self-referential blocks since they are decompressed differently. As noted above, the decompression of the self-referential blocks are more computationally complex, require more computer resources and take more time. Accordingly, such compressors/decompressors usually cannot consistently support general video frame transmission over low bandwidth conduit while maintaining quality image fidelity.
Another problem with the previously known methods is that they only approximate the frames of data being generated by a video camera and the like. Thus, the frames being regenerated by a decompressor at a transmission receiver do not exactly reproduce the frames at the compressor. This distortion is especially detrimental to schemes which express a current frame as a difference or movement of data elements in a previous frame. As a consequence, the regeneration of a current frame at the decompressor differs from the current frame at the compressor and since the regenerated frame is used by the decompressor to regenerate the next current frame, the discrepancy grows between compressor and decompressor. To reduce this discrepancy, the compressor usually includes a decompressor as well. The decompressor coupled to the compressor regenerates the approximation of the current frame in the same manner as the decompressor which receives the transmitted coefficients. The decompressed frame is used by the compressor to determine the coefficients or motion vectors which best represent the next current frame. Since both decompressors implement the same decompression method, the compressor is generating coefficients based on an accurate representation of the regenerated data available to the decompressor at the receiver site.
Compressors/decompressors which compress/decompress differential frames require initialization. Initialization is a problem in such compressors because there is no known way to represent the first frame in a differential manner since there is no previous frame in this case. Accordingly, an intraframe is required to begin the transmission process. The self-referential intraframe is used to initialize the decompressor at a receiver and the decompressor at the compressor site, if one is used there. In order to decompress the intraframe, however, the compressor/decompressor must be able to generate and decode compressed self-referential frames and compressed differential frames. Compressors/decompressors supporting both types of frames are more computational complex than if they simply decompress one type of data representation. However, because the transmission of an intraframe is required to initialize the video frame sequence transmission, a decompressor which supports differential frame representations alone is not feasible.
Intraframes are also used to rectify transmission errors caused by noise or the like. Such errors may corrupt a differential frame during transmission. The next frame regenerated by the decompressor using the corrupted frame generates an erroneous frame which differs from the frame being compressed by the compressor at the transmitting site. As motion vectors and coefficients based on the previous frame at the compressor are applied to the erroneous frame regenerated at the decompressor, the error is compounded. Soon, the frames regenerated by the decompressor may bear no resemblance to the frames being represented by the compressor. In order to ensure that the decompressors at the transmitter and receiver sites are regenerating the same frames, an intraframe is periodically transmitted. The decompression of the self-referential intraframe at the transmitter and receiver sites provides the sites with the same frame for representing and decompressing frames unless the intraframe was corrupted during transmission. Intraframes are usually transmitted at a frequency to ensure that an uncorrupted intraframe is received by the receiver often enough to support transmission integrity.
Even in relatively noise-free environments, the previously known methods of compressing video frames require the periodic transmission of intraframes to correct artifacts introduced in the regenerated frames. Many of the methods use rectangular or square blocks to compress frame data and the boundaries of such blocks may generate artifacts, especially in the DCT method. Artifacts produced by the block boundaries are used by the compressor to erroneously represent data and they propagate through subsequently regenerated frames at the receiver decompressor if self-referential intraframes are not periodically transmitted. Thus, there are a number of reasons that current compressor/decompressor systems are dependent on the frequent transmission of intraframes to provide error tolerance in the transmission of video frames. As noted above, the use of intraframes impact the efficiency and processing times for such systems to the extent that they may not be able to consistently support low bit rate video transmissions.
Other known compressor designs include two or more of the above discussed methods. For example, a compressor may use the motion compensation and DCT methods together. This may be done by comparing a current frame to a previous frame to generate a set of motion vectors that represent the current frame in terms of the previous frame. Because the motion vectors do not adjust the pixel values but merely describe movement of blocks in the previous frame which represent the current frame, a second comparison is performed. In this comparison, the pixel values of the previous frame are subtracted from the pixel values of the current frame and the differential pixel values are compressed using a DCT or other compression method. The compressed differential frame is sometimes called a residual frame. While this compressor design improves the motion compensation only method, it requires that the current and previous frames be processed twice to generate the frame representation. Additionally, there are two components of the representation, each of which must be processed differently to regenerate the frame. Thus, such compressors are computationally expensive and the generation of two components may require a number of bits which may impact the ability of the compressor to limit the number of bits required for a low bit rate transmission.
Another limitation of previously used methods is the manner in which frames are segmented to determine coefficients or motion vectors. Usually, the frame data is divided into rectangular or square blocks. This shape may not conform to an object in an image or to its movement from one frame to another. Yet the resources to support different shapes and sizes for compression has been heretofore too great for compression/decompressor schemes intended for low bandwidth communication conduits. In fact, many compressors/decompressors cannot process frame sizes which are not an integral multiple of the fixed block sizes used by the compressor/decompressor or those for which the compressor/decompressor may be configured. Even those compressors/decompressors which attempt to process frames which cannot be divided into a integral number of blocks supported by the compressor/decompressor, use methods which may introduce artifacts into regenerated frames. For example, some methods pad the incomplete block in the current frame with zero values. In the differential methods, the use of these non-existing values require special processing or they may generate erroneous values.
Special case processing includes the grouping of the pixels into a residual block for self-referential compression or a shift in the processing of the frame to include the pixels on a second pass. Thus, special processing requires additional computer resources and processing time while erroneous values corrupt the integrity of the frames being compressed and regenerated. In either event, known compressors/decompressors do not adequately support frame sizes and shapes which do not conform to an integral of the block size used by the compressor/decompressor. For similar reasons, previously known compression/decompression methods do not support segmented video. Segmented video is the separation of data which can be integrated into a single frame. For example, a camera generating frames containing the antics of actors and another stream of frames containing background scenes may be integrated at a receiving site. However, there needs to be a way to keep the two frame streams separate since pixel data about the actors would corrupt the data integrity of the background scene for decompression of the next current frame. Current methods do not support such selective processing within a frame, especially if the separation of the objects in the frame is irregular and does not conform to the block shapes and sizes supported by the compressor/decompressor.
Previously known frame compressor methods result in different amounts of data to represent different frames. These differences mean that some frame representations require fewer bits for transmission than others. The number of bits for transmitting a representation is usually compared to a maximum in an effort to keep the number of bits transmitted over a standard unit of time constant. However, this method fails to take advantage of the greater accuracy which could be obtained for frames which require fewer bits to represent. Instead, the accuracy of any frame representation tends to be approximately the same since the method for determining the coefficients on vectors which represent a frame does not change regardless of the number of bits generated by the compressor. While some methods for generating frame representations do vary the number of bits used for a representation, these methods do not consistently support low bit rate communication with quality image fidelity.
What is needed is a method for representing frame data using a differential scheme without requiring any transmission of a self-referential frame.
What is needed is a method for representing frame data which reduces blocking artifacts in regenerated frames.
What is needed is a method for representing frames having sizes or shapes which are not integral multiples of the block shapes and sizes supported by the compressor/decompressor.
What is needed is a method for supporting segmented video.
What is needed is a method for representing frame data which uses different sized and shaped blocks to determine the coefficients and vectors for representing the frames.
What is needed is a method which does not require the use of residual blocks to process poor correspondence current frame blocks.
What is needed is a compressor/decompressor which need not support the compression/decompression of both self-referential and differential blocks.
What is needed is a method for representing frame data which adjusts the determination of the coefficients and vectors in response to a measurement of the number of bits required to represent the determined coefficients and vectors.