1. Field of the Invention
The present invention relates to a method and apparatus for decoding High Definition (HD) television signals and generating low resolution versions of the HD signals; and more particularly to a three-layer scaleable decoder and method of decoding.
2. Description of the Related Art
Digital video signal processing is an area of science and engineering that has developed rapidly over the past decade. The maturity of the Moving Picture Expert Group (MPEG) video coding standard represents a very important achievement for the video industry and provides strong support for digital transmission of video signals. With advancements in digital compression and other techniques such as digital modulation and packetization, as well as VLSI technology, the fundamentals of television have been reinvented for the digital age.
The first U.S. digital television transmission standard developed for broadcast of high and low definition television by a Grand Alliance of companies has been accepted by the Federal Communications Commission (FCC). High definition digital television broadcasts are typically referred to as HDTV, while low definition digital television broadcasts are generally referred to as SDTV. These terms will be used througnout this application, but are no tied to a particular format or standard. Instead, these terms are used to cover the high and low definition digital television of any coding standard (e.g., such as for VTRs and television).
In 1994 SDTV broadcasts became a reality when the first digital television services, broadcasted via satellite, went on the air. The Digital Satellite Service (DSS) units developed by Thomson Consumer Electronics, etc. have been distributed to more than 1 million homes. The highly sophisticated methods of transmitting and receiving digital television not only produce higher-quality television broadcasts, but also create new services, such as movies on demand, interactive programming, multimedia applications as well as telephone and computer services through the television.
Soon, HDTV will become a reality and join SDTV. Accordingly, in the near future, expect advanced television (ATV) broadcasts which include co-existent broadcasts of HDTV and SDTV. The problem, however, arises in that HDTV signals cannot be decoded by current SDTV decoders or NTSC decoders. (NTSC is the current analog broadcast standard in the U.S.)
The notion of format conversion therefore has become increasingly popular as a way of enabling existing display devices, such as NTSC television and computer monitors, to receive transmitted HD signals by implementing down-conversion technology into existing decoder systems.
The conventional decoding system for obtaining a low-resolution image sequence from an HD transmission, however, suffers from significant drawbacks. Specifically, the conventional format conversion method fully decodes the received HD bitstream, and then down-converts the decoded bitstream by pre-filtering and sub-sampling. Although this conventional technique achieves a high quality low resolution version of the original HD transmission, the cost of implementing this technique is high due to the large memory required to store full-resolution anchor frames during MPEG decoding.
As an alternative, a down-converting technique has been proposed which addressees the memory requirements associated with full-resolution MPEG decoding by first down-converting HD signals to a lower resolution. Here, incoming blocks are subject to down-conversion within the decoding loop so that the down-converted pictures, rather than full-resolution pictures, are stored into the memory as the anchor pictures used for MPEG decoding. The obvious drawback of this alternative is that image reconstruction, which in MPEG video decoding requires prediction from stored anchor pictures, is performed using low resolution pictures. Therefore, the reconstructed images are degraded because an imperfect anchor image is used during motion-compensated prediction (described below). Because this degraded reconstructed image is used to reconstruct subsequent pictures, decoder prediction will xe2x80x9cdriftxe2x80x9d away from the prediction result of the encoder.
To clarify the operation of down-conversion, MPEG encoding/decoding is first discussed. For MPEG video encoding of an HDTV transmission, image blocks of 8xc3x978 pixels in the spatial domain are converted into 8xc3x978 DCT (discrete cosine transform) blocks of coefficients in the DCT or frequency domain. Specifically, in most coding formats such as MPEG, the HDTV signal is divided into a luminance component (Y) and two chroma components (U) and (V). Macro blocks of 8xc3x978 DCT blocks of DCT coefficients are formed.
Besides variable length encoding, MPEG provides for intra- and inter-coding. Intra-coding is where a field or frame of the HDTV signal, referred to as a picture, is encoded based on the pixels therein. Several well known techniques exist for intra-coding. intra-coded picture is typically referred to as an I-picture.
Inter-coding, sometimes referred to as predictive encoding, is where a picture is encoded based on a reference picture, referred to as an anchor picture. In inter-coding, each macro block (i.e., related luminance and chroma blocks) of the picture being encoded is compared with the macro blocks of the anchor picture to find the macro block of the anchor picture providing the greatest correlation therewith. The vector between the two macro blocks is then determined as the motion vector. The inter-coded HDTV signal for the macro block being encoded will then include the motion vector and the differences between the macro block being encoded and the corresponding macro block of the anchor picture providing the greatest correlation.
For example, a series of pictures may have the display order I1B1B2P1B3B4P2B5B6P3B7B8I2 . . . . The transmitted HDTV signal, however, will have the pictures arranged in the order of encoding as follows: I1P1B1B2P2B3B4P3B5B6I2B7B8. P-pictures are encoded using the previous I-picture or P-picture as the anchor picture. In the above example, P-pictures P1, P2, and P3 were encoded using I-picture I1, P-picture P1, and P-picture P2, respectively, as the anchor picture.
The B-pictures may be forward predicted, backward predicted, or bi-directionally predicted. For instance, if B-picture B1 was encoded using I-picture I1 as the anchor picture, then B-picture B1 is forward predicted. Alternatively, if B-picture B1 was encoded using P-picture P1 as the anchor picture, then B-picture B1 is back or backward predicted. If B-picture B1 was encoded using both I-picture I1 and P-picture P1 (typically an average thereof) as anchor pictures, then B-picture B1 is bi-directionally predicted.
The headers in the HDTV signal indicate whether pictures are I, B, or P-pictures and the direction of encoding. These headers also indicate the group of picture (GOP) size N and the distance between anchor pictures M. The GOP size indicates the distance between I-pictures, which in the above example would be N=12. Since I-pictures and P-pictures are anchor pictures, the distance between anchor pictures in the above example would be M=3. Based on the information provided in the headers, the HDTV signal can be properly decoded.
Therefore, if inter-coding was used to encode an incoming frame, an inverse DCT operation performed at the decoding end outputs only the difference (residual) between the present picture and a previous picture. To produce a complete picture requires additional structure, including a device for performing motion-compensated prediction (xe2x80x9cmotion compensationxe2x80x9d), which produces predicted values to be subsequently added to the residual from stored anchor pictures.
FIG. 15 illustrates a conventional apparatus for decoding and down-converting an incoming HD bitstream. A variable length decoder (VLD) and dequantizer (IQ) 10 receives an incoming HD transmission, performs variable length decoding on the MPEG encoded video signals, and dequantizes the resulting DCT coefficients to produce arrays of dequantized DCT coefficients. The resulting DCT coefficient blocks are then converted to the spatial domain by an inverse discrete cosine transformer (IDCT) 14. A picture store 22 stores the two previous anchor pictures (e.g., I or P-pictures).
A motion compensated prediction unit 20 will receive at least one anchor picture from the picture store 22 and output the macroblocks of the anchor picture pointed to by the motion vector. An adder 18 receives the resulting macroblocks, and also receives the output of the IDCT 14. Consequently, when a B or P-picture is being down-converted, a complete picture can be obtained by adding the output of the IDCT 14, which represents residual data, and the values resulting from the motion compensated prediction unit 20 to create a complete picture. When an I-picture is output from the IDCT 14, there is no need to add anchor picture information thereto. Consequently, the motion compensator 20 will not send output to the adder 18, and the output of the adder 18 will be the output of the IDCT 14.
The output of the adder 18 is then received by a down-converter 12, which pre-filters and sub-samples the full resolution pictures output by the adder 18 to achieve a low resolution version of the decoded HDTV transmission. Next, after the decoded pictures are down-converted, they are sent to a reformatter 24. Since the transmission, and consequently the reception order, of the pictures is not in the proper display order, the reformatter 24 reformats the order of the pictures into the proper display order.
To better understand the operation of the apparatus illustrated in FIG. 15, assume that an HDTV signal such as that discussed above is received. Therefore, I-picture I1 will be converted to the spatial domain by the IDCT 14 and output via the adder 18 without any information having been added thereto. Since an I picture is an anchor picture, the picture store 22 will store the output of the adder 18. After down-conversion by the down-converter 12, the reformatter 24 will then determine what output should be sent as the SDTV signal. The reformatter operates according the following rules: (1) if the picture received is the first anchor picture received, then no output will be sent; (2) if the picture received is an anchor picture but not the first anchor picture received, then the previously received anchor picture will be output; and (3) if the picture received is a B-picture, then the B-picture will be immediately output.
Therefore, upon receipt of I-picture I1, the reformatter 24 will not send any output. The next picture received will be P-picture P1. The adder 18 will then receive the output of IDCT 12 and macroblocks from the I-picture I1 pointed to by the motion vectors. Consequently, the adder 18 will generate a complete picture. Since this complete picture is an anchor picture, the picture store 22 will then store the complete picture P1. According to the rules discussed above, the reformatter 24 will then output the I-picture I1 (i.e., the previous anchor picture).
The next two pictures received are B-pictures B1 and B2. Complete pictures will be formed from these B-pictures in the same manner discussed above with respect to P-picture P1, except that, depending on the direction of encoding, either the I-picture I1 and/or the P-picture P1 will be used as the anchor picture. Since the adder 18 outputs a B-picture, the reformatter 24 will immediately output the B-picture. Consequently, the output from the reformatter 24 will be I1B1B2.
Next, the P-picture P2 is received and processed in the same manner as P-picture P1. When the adder 18 outputs the complete P-picture P2, the frame store 22 will replace the I-picture I1 with the P-picture P2. The reformatter 24, according to the rules discussed above, will then output the P-picture P1. In this manner, the reformatter 24 will output the pictures in the proper display order.
As mentioned above, although the conventional system for decoding and down-converting incoming HDTV signals achieves a quality low resolution result, this system cannot be implemented unless the decoder is provided with sufficient memory to store two full-resolution anchor pictures. Such memory capacity renders the cost of the conventional decoder quite high. The alternative proposed decoder and down-conversion system on the other hand, in which low-resolution anchor pictures are stored for MPEG decoding, does not ensure that optimal low-resolution motion compensation is achieved.
An object of the present invention is to eliminate the disadvantages and problems experienced by the conventional decoding and down-conversion techniques discussed above.
A further object of the present invention is to provide a method and apparatus for decoding a HDTV transmission which, depending on the memory capacity of the decoder, operates in one of three different modes: Full memory, Half Memory, and Quarter Memory while minimizing the circuit complexity required by the decoder to operate in each of these modes.
Another object of the present invention is to provide a method and apparatus for achieving a low resolution image sequence from a HD bitstream in which the filtering process utilized to perform motion compensated prediction with low-resolution anchor frames is optimized.
These and other objects are achieved by an apparatus for decoding a digital signal, comprising: composite picture forming means for forming a composite picture from a first digital video signal and a second digital video signal, said first digital video signal including inter-coded picture data; down-converting means for receiving a third digital video signal, for outputting said third digital video signal to said composite picture forming means as said first digital signal in a first mode, for down-converting said third digital video signal into a fourth digital video signal in a second mode, and outputting said fourth digital video signal to said composite picture forming means in said second mode; a memory for storing anchor pictures output from said composite picture forming means; and motion compensation means for generating said second digital signal based on said stored anchor pictures.
These and other objects are also achieved by a method for decoding a digital signal, comprising: forming a composite picture from a first digital video signal and a second digital video signal using a composite picture forming means, said first digital video signal including inter-coded picture data; receiving a third digital video signal; outputting said third digital video signal to said composite picture forming means as said first digital signal in a first mode; down-converting said third digital video signal into a fourth digital video signal in a second mode; outputting said fourth digital video signal to said composite picture forming means in said second mode; storing anchor pictures output from said composite picture forming means; and generating said second digital signal based on said stored anchor pictures.
Other objects, features, and characteristics of the present invention; methods, operation, and functions of the related elements of the structure; combination of parts; and economies of manufacture will become apparent from the following detailed description of the preferred embodiments and accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures.