The present invention relates to a video decompression processor, and more particularly to a video syntax parser for parsing out the fields of interest from a compressed video bitstream to various subsystems of the decompression processor.
Digital transmission of television signals can deliver video and audio services of much higher quality than analog techniques. Digital transmission schemes are particularly advantageous for signals that are broadcast via a cable television network or by satellite to cable television affiliates and/or directly to home satellite television receivers. It is expected that digital television transmitter and receiver systems will replace existing analog systems just as digital compact discs have replaced analog phonograph records in the audio industry.
A substantial amount of digital data must be transmitted in any digital television system. In a digital television system, a subscriber receives the digital data stream via a receiver/descrambler that provides video, audio and data to the subscriber. In order to most efficiently use the available radio frequency spectrum, it is advantageous to compress the digital television signals to minimize the amount of data that must be transmitted.
The video portion of a television signal comprises a sequence of video "frames" that together provide a moving picture. In digital television systems, each line of a video frame is defined by a sequence of digital data bits referred to as "pixels." A large amount of data is required to define each video frame of a television signal. For example, 7.4 megabits of data is required to provide one video frame at NTSC (National Television System Committee) resolution. This assumes a 640 pixel by 480 line display is used with eight bits of intensity value for each of the primary colors red, green and blue. At PAL (phase alternating line) resolution, 9.7 megabits of data is required to provide one video frame. In this instance, a 704 pixel by 576 line display is used with eight bits of intensity value for each of the primary colors red, green and blue. In order to manage this amount of information, the data must be compressed.
Video compression techniques enable the efficient transmission of digital video signals over conventional communication channels. Such techniques use compression algorithms that take advantage of the correlation among adjacent pixels in order to derive a more efficient representation of the important information in a video signal. The most powerful compression systems not only take advantage of spacial correlation, but can also utilize similarities among adjacent frames to further compact the data. In such systems, differential encoding is usually used to transmit only the difference between an actual frame and a prediction of the actual frame. The prediction is based on information derived from a previous frame of the same video sequence.
Examples of video compression systems using motion compensation can be found in Krause, et al. U.S. Pat. Nos. 5,057,916; 5,068,724; 5,091,782; 5,093,720; and 5,235,419. Generally, such motion compensation systems take advantage of a block-matching motion estimation algorithm. In this case, a motion vector is determined for each block in a current frame of an image by identifying a block in a previous frame which most closely resembles the particular current block. The entire current frame can then be reconstructed at a decoder by sending the difference between the corresponding block pairs, together with the motion vectors that are required to identify the corresponding pairs. Often, the amount of transmitted data is further reduced by compressing both the displaced block differences and the motion vector signals. Block matching motion estimating algorithms are particularly effective when combined with block-based spatial compression techniques such as the discrete cosine transform (DCT). Additional compression can be achieved using variable length coding to provide shorter length "code words" to events that are more likely to occur and longer code words to less likely events. At the receiver, the variable length code words are decoded by, e.g., a Huffman decoder. An example of a Huffman decoder implementation can be found, for example, in U.S. Pat. No. 5,233,348 to Pollmann, et al. and incorporated herein by reference.
Each of a succession of digital video frames that form a video program can be categorized as an intra frame (I-frame), predicted frame (P-frame), or bidirectional frame (B-frame). The prediction is based upon the temporal correlation between successive frames. Portions of frames do not differ from one another over short periods of time. The encoding and decoding methods differ for each type of picture. The simplest methods are those used for I-frames, followed by those for P-frames and then B-frames.
I-frames completely describe a single frame without reference to any other frame. For improved error concealment, motion vectors can be included with an I-frame. An error in an I-frame has the potential for greater impact on the displayed video since both P-frames and B-frames are predicted from an I-frame.
P-frames are predicted based on previous I or P frames. The reference is from an earlier I or P frame to a future P-frame and is therefore called "forward prediction." B-frames are predicted from the closest earlier I or P frame and the closest later I or P frame. The reference to a future picture (i.e., one that has not yet been displayed) is called "backward prediction." There are cases where backward prediction is very useful in increasing the compression rate. For example, in a scene in which a door opens, the current picture may predict what is behind the door based upon a future picture in which the door is already open.
B-frames yield the most compression but also incorporate the most error. To eliminate error propagation, B-frames may never be predicted from other B-frames. P-frames yield less error and less compression. I-frames yield the least compression, but are able to provide random access entry points into a video sequence.
One standard that has been adopted for encoding digital video signals is the Motion Picture Experts Group (MPEG) standard, and more particularly the MPEG-2 standard. This standard does not specify any particular distribution that I-frames, P-frames and B-frames must take within a sequence. Instead, the standard allows different distributions to provide different degrees of compression and random accessibility. One common distribution is to have I-frames about every half second and two B-frames between successive I or P frames. To decode P frames, the previous I-frame must be available. Similarly, to decode B frames, the previous and future P or I frames must be available. Consequently, the video frames are encoded in dependency order, such that all pictures used for prediction are coded before the pictures predicted therefrom. Further details of the MPEG-2standard (and the alternative DigiCipher.RTM. II standard) and its implementation in a video decompression processor can be found in document MC68VDP/D, a preliminary data sheet entitled "MPEG-2/DCII Video Decompression Processor,".COPYRGT. Motorola Microprocessor and Memory Technologies Group, 1994, incorporated herein by reference.
In order to implement video compression in practical systems, a video decompression processor is required for each digital television receiver. The development of very large scale integration (VLSI) integrated circuit chips is currently underway to implement such video decompression processors. In consumer products such as television sets, it is imperative that the cost of the system components be kept as low as possible.
One important subsystem of a video decompression processor is known as the video syntax parser. This subsystem is responsible for parsing out the fields of interest from the compressed video bitstream, which may be, for example, in the DigiCipher II or MPEG-2 syntax. Typically, the parser will receive its data from an external random access memory. Different subsystems of the video decompression processor will require different fields of data from the incoming bitstream. For example, a motion vector decoder will require motion vectors carried in the bitstream. A variable length code word decoder such as a Huffman decoder within the decompression processor will require the code words for decoding into transform coefficients. Other subsystems of the video decompression processor will also require various information carried in the bitstream. All of this information must be parsed from the bitstream and forwarded on to the appropriate subsystem for further processing. The parser is a speed critical subsystem. It must obtain and parse the required data from the incoming bitstream on an efficient and orderly basis. Complexity of the parser must be reduced to the extent possible in order to keep its costs at a minimum.
The present invention provides a video syntax parser that meets the aforementioned criteria.