The present invention relates to video compression, and in particular to video compression with multiple unit processing.
Digital video is the format commonly used with personal computers, digital-video cameras, and other electronic systems. Since a huge amount of memory or storage space is required to fully store all 30 or more frames per second of video, the images are usually compressed. Often sequential images in the video sequence differ only slightly. The difference from a previous (or following) image in the sequence can be detected and encoded, rather than the entire picture using a compression technique, such as MPEG encoding.
MPEG is a video signal compression standard, established by the Moving Picture Experts Group (“MPEG”) of the International Standardization Organization. MPEG is a multistage algorithm that integrates a number of well known data compression techniques into a single system. These include motion-compensated predictive coding, discrete cosine transform (“DCT”), adaptive quantization, and variable length coding (“VLC”). The main objective of MPEG is to remove redundancy that normally exists in the spatial domain (within a frame of video) as well as in the temporal domain (frame-to-frame), while allowing inter-frame compression and interleaved audio.
There are two basic forms of video signals: an interlaced scan signal and a non-interlaced scan signal. An interlaced scan signal is a technique employed in television systems in which every television frame consists of two fields referred to as an odd-field and an even-field. Each field scans the entire picture from side to side and top to bottom. However, the horizontal scan lines of one (e.g., odd) field are positioned half way between the horizontal scan lines of the other (e.g., even) field. Interlaced scan signals are typically used in broadcast television (“TV”) and high definition television (“HDTV”). Non-interlaced scan signals are typically used in computer systems and when compressed have data rates up to 1.8 Mb/sec for combined video and audio. The Moving Picture Experts Group has established an MPEG-1 protocol intended for use in compressing/decompressing non-interlaced video signals, and an MPEG-2 protocol intended for use in compressing/decompressing interlaced TV and HDTV signals.
Before a conventional video signal may be compressed in accordance with either MPEG protocol it must first be digitized. The digitization process produces digital video data which specifies the intensity and color of the video image at specific locations in the video image that are referred to as pixels. Each pixel is associated with a coordinate positioned among an array of coordinates arranged in vertical columns and horizontal rows. Each pixel's coordinate is defined by an intersection of a vertical column with a horizontal row. In converting each frame of video into a frame of digital video data, scan lines of the two interlaced fields making up a frame of un-digitized video are interdigitated in a single matrix of digital data. Interdigitization of the digital video data causes pixels of a scan line from an odd-field to have odd row coordinates in the frame of digital video data. Similarly, interdigitization of the digital video data causes pixels of a scan line from an even-field to have even row coordinates in the frame of digital video data.
MPEG-1 and MPEG-2 each divides a video input signal, generally a successive occurrence of frames, into sequences or groups of frames (“GOF”), also referred to as a group of pictures (“GOP”). The frames in respective GOFs are encoded into a specific format. Respective frames of encoded data are divided into slices representing, for example, sixteen image lines. Each slice is divided into macroblocks each of which represents, for example, a 16×16 matrix of pixels. Each macroblock is divided into six blocks including four blocks relating to luminance data and two blocks relating to chrominance data. The MPEG-2 protocol encodes luminance and chrominance data separately and then combines the encoded video data into a compressed video stream. The luminance blocks relate to respective 8×8 matrices of pixels. Each chrominance block includes an 8×8 matrix of data relating to the entire 16×16 matrix of pixels, represented by the macroblock. After the video data is encoded it is then compressed, buffered, modulated and finally transmitted to a decoder in accordance with the MPEG protocol. The MPEG protocol typically includes a plurality of layers each with respective header information. Nominally each header includes a start code, data related to the respective layer and provisions for adding header information.
There are generally three different encoding formats which may be applied to video data. Intra-frame coding produces an “I” block, designating a block of data where the encoding relies solely on information within a video frame where the macroblock of data is located. Inter-frame coding may produce either a “P” block or a “B” block. A “P” block designates a block of data where the encoding relies on a prediction based upon blocks of information found in a prior video frame. A “B” block is a block of data where the encoding relies on a prediction based upon blocks of data from surrounding video frames, i.e., a prior I or P frame and/or a subsequent P frame of video data.
One means used to eliminate frame-to-frame redundancy is to estimate the displacement of moving objects in the video images, and encode motion vectors representing such motion from frame to frame. The accuracy of such motion estimation affects the coding performance and the quality of the output video. Motion estimation performed on a pixel-by-pixel basis has the potential for providing the highest quality video output, but comes at a high cost in terms of computational resources. Motion estimation can be performed on a block-by-block basis to provide satisfactory video quality with a significantly reduced requirement for computational performance.
These techniques are used for reducing the data required to store video signals, or for transmitting video signals over communication links having a smaller bandwidth than is required to transmit uncompressed video. Examples of such communication links includes local area networks, wide area networks, and circuit-switched telephone networks, such as integrated services digital network (ISDN) lines or standard telephone lines.
Video signal processing and video signal compression are variously described in Video Demystifled: A Handbook for the Digital Engineer, Second Ed., by K. Jack, High Text Interactive, Inc., San Diego, Calif., U.S.A., 1996; Image and Video Compression Standards: Algorithms and Architectures, Second Edition, by V. Bhaskaran et al., Kluwer Academic Publishers, Norwell, Mass., U.S.A., 1997; Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation, by P. Kuhn, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999; as well as in U.S. Pat. Nos. 6,421,466 B1; 6,363,117; 6,014,181; 5,731,850; and 5,510,857; and U.S. patent application Publication Nos. 2002/0176502 A1; and 2002/0131502 A1, all of which are incorporated in this description by reference.