The present invention relates to the compression of digital video signals, and more particularly to a method and apparatus for processing digitized video signals for transmission in a compressed form.
Television signals are conventionally transmitted in analog form according to various standards adopted by particular countries. For example, the United States has adopted the standards of the National Television System Committee ("NTSC"). Most European countries have adopted either PAL (Phase Alternating Line) or SECAM (Sequential Color And Memory) standards.
Digital transmission of television signals can deliver video and audio services of much higher quality than analog techniques. Digital transmission schemes are particularly advantageous for signals that are broadcast by satellite to cable television affiliates and/or directly to home satellite television receivers. It is expected that digital television transmitter and receiver systems will replace existing analog systems just as digital compact discs have largely replaced analog phonograph records in the audio industry.
A substantial amount of digital data must be transmitted in any digital television system. This is particularly true where high definition television ("HDTV") is provided. In a digital television system, a subscriber receives the digital data stream via a receiver/descrambler that provides video, audio, and data to the subscriber. In order to most efficiently use the available radio frequency spectrum, it is advantageous to compress the digital television signals to minimize the amount of data that must be transmitted.
The video portion of a television signal comprises a sequence of video "frames" that together provide a moving picture. In digital television systems, each line of a video frame is defined by a sequence of digital data referred to as "pixels." A large amount of data is required to define each video frame of a television signal. For example, 7.4 megabits of data is required to provide one video frame at NTSC resolution. This assumes a 640 pixel by 480 line display is used with 8 bits of intensity value for each of the primary colors red, green and blue. High definition television requires substantially more data to provide each video frame. In order to manage this amount of data, particularly for HDTV applications, the data must be compressed.
Video compression techniques enable the efficient transmission of digital video signals over conventional communication channels. Such techniques use compression algorithms that take advantage of the correlation among adjacent pixels in order to derive a more efficient representation of the important information in a video signal.
One of the most effective and frequently used classes of algorithms for video compression is referred to as "transform coders." In such systems, blocks of video are linearly and successively transformed into a new domain with properties significantly different from the image intensity domain. The blocks may be nonoverlapping, as in the case of the discrete cosine transform (DCT), or overlapping as in the case of the lapped orthogonal transform (LOT). A system using the DCT is described in Chen and Pratt, "Scene Adaptive Coder," IEEE Transactions on Communications, Vol. COM-32, No. 3, March, 1984. A system using the LOT is described in Malvar and Staelin, "The LOT: Transform Coding Without Blocking Effects," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 37, No. 3, April, 1989.
Video transforms are used to reduce the correlation that exists among samples of image intensity (pixels). Thus, these transforms concentrate the energy into a relatively small number of transform coefficients. Most common transforms have properties that easily permit the quantization of coefficients based on a model of the human visual system. For example, the DCT produces coefficients with amplitudes that are representative of the energy in a particular band of the frequency spectrum. Therefore, it is possible to utilize the fact that the human viewer is more critical of errors in the low frequency regions of an image than in the high frequency or detailed areas. In general, the high frequency coefficients are always quantized more coarsely than the low frequencies.
The output of the DCT is a matrix of coefficients which represent energy in the two-dimensional frequency domain. Most of the energy is concentrated at the upper left corner of the matrix, which is the low frequency region. If the coefficients are scanned in a zigzag manner, starting in the upper left corner, the resultant sequence will contain long strings of zeros, especially toward the end of the sequence. One of the major objectives of the DCT compression algorithm is to create zeros and to bunch them together for efficient coding.
Coarse quantization of the low frequency coefficients and the reduced number of nonzero coefficients greatly improves the compressibility of an image. Simple statistical coding techniques can then be used to efficiently represent the remaining information. This usually involves the use of variable length code words to convey the amplitude of the coefficients that are retained. The smaller amplitudes which occur the most frequently are assigned short code words. The less probable large amplitudes are assigned long code words. Huffman coding and arithmetic coding are two frequently used methods of statistical coding. Huffman coding is used in the system of Chen and Pratt referred to above. Arithmetic coding is described in Langdon, "An Introduction to Arithmetic Coding," IBM Journal for Research Development, Vol. 28, No. 2, March, 1984.
In order to reconstruct a video signal from a stream of transmitted coefficients, it is necessary to know the location or address of each coefficient. Runlength coding is often used for this purpose. One form of runlength coding relies on a two-dimensional variable length coding scheme for sequences of quantized transform coefficients. In a given sequence, the value of a nonzero coefficient (amplitude) is defined as one dimension and the number of zeros preceding the nonzero coefficient (runlength) is defined as another dimension. The combination of amplitude and runlength is defined as an "event." In such a scheme, after a subset of an image frame has been transformed into a block of transform coefficients, only the nonzero coefficients are transmitted. Their addresses can be determined at the receiver by sending runlength codes. A single runlength code denotes the number of preceding zero amplitude coefficients since the last nonzero coefficient in the scan. As noted above, the coefficients within a block are usually serialized using a zigzag scan order. Huffman or arithmetic coding can again be used to represent the runlength codes.
The runlength coding method suffers from various deficiencies. For example, the efficiency of the runlength coding method depends on the order in which the coefficients are scanned. In addition, the statistics of the runlength probability distribution vary depending on the location within the scan. This results in either additional complexity or reduced efficiency when assigning variable length code words to represent the runlength.
It would be advantageous to provide a method and apparatus for encoding video transform coefficient address information that overcomes the problems inherent in the runlength coding method. Such a method and apparatus should be straightforward to implement, and allow the mass production of reliable and cost efficient consumer decoders. The present invention provides a method and apparatus for identifying the locations of transmitted transform coefficients within a block, enjoying the aforementioned advantages.