Television signals are conventionally transmitted in an analog form in accordance with the National Television Systems Committee ("NTSC") standard adopted in the United States. Television signals transmitted in a digital form deliver video and audio services of a much higher quality than analog transmissions. In an uncompressed form, digital television signals require the transmission of a much greater amount of data than analog systems. This is particularly true of high definition television ("HDTV") transmissions. Unfortunately, digital broadcast transmissions, including HDTV transmissions, are required to be broadcast in the same 6 MHz bandwidth provided under the NTSC standard for analog transmission.
As shown in FIG. 1, the video portion of current broadcast television signals contain a sequence of video "frames" that together provide a moving picture. A video frame is defined by pixels (picture elements) containing luminance (brightness) components and chrominance (color) components. The pixels are arranged horizontally and vertically in lines and columns to produce picture displays, such as, for example, 720 pixels.times.480 lines.
In a digital television transmission, a digital value is used to represent the intensity of each of the primary pixel colors red, green and blue. Accordingly, a digital representation of a single 720 pixel.times.480 line video frame display requires at least 8.3 megabits of data. High definition digital television, incorporating higher density pixel resolutions, requires even more data per frame.
Because 30 video frames are displayed each second in a television signal, the video picture data in a digital transmission must be compressed to be both manageable and to be broadcast within the 6 MHz broadcast. To that end, digital video data may be transmitted under, for example, data compression standards known as MPEG-1 (Motion Picture Experts Group) or MPEG-2. MPEG-2, of significance to the present invention, is the data compression standard described in the document "International Standards Organization--Moving Picture Experts Group, Recommendation ISO/IEC 13818-2: 1995(E)" (hereinafter "the 1995 ISO-MPEG International Standard"). Unless indicated otherwise, any reference made herein to MPEG-2 data bit streams refers to data bit streams that comply with MPEG-2 standards as defined in the November, 1995 ISO-MPEG International Standard.
MPEG-2 video compression techniques enable the transmission of digital television signals within the 6 MHz range used by conventional television transmissions. MPEG-2 video compression uses compression algorithms to take advantage of temporal and spatial redundancies among pixels in order to efficiently represent the important information in a video signal. The MPEG-2 standard stipulates that for every four luminance components, only two chrominance components need be represented because the human eye is much more sensitive to luminance components than to color components.
In digital television transmission, video data signals may be sampled, encoded and compressed in accordance with the MPEG-2 standard to produce a digital video data bitstream which may be modulated for transmission. An exemplary type of modulation is quadrature amplitude modulation (QAM), such as 64-QAM (64 quantizing levels). Another type of modulation is vestigial side band (VSB).
The video input data may be an interlaced format or a progressive format. Each frame in interlaced video consists of two fields of the picture which are separated by one field period. Each frame in progressive video consists of the entire picture.
Under the MPEG-2 standard, once video pixel data of a frame is sampled and digitized, it is encoded using techniques known as intraframe and interframe encoding. In general, intraframe encoding involves encoding a video frame from a single source frame to provide sufficient encoded spatial data for reconstruction of the video image from only the intraframe encoded data. An intraframe (designated I) encoded frame uses spatial compression without reference to any other frame. Conversely, interframe encoding involves the generation of encoded frame data from temporal differences between information in a current source frame and information in a frame predicted from prior or subsequent transmitted frames. There are two types of interframe encoded frames: predictive coded frames (designated P) and bidirectionally predictive coded frames (designated B). The P frames are predicted from a previous "anchor" frame (either I or P frames) and the B frames are predicted by interpolating from two bracketing anchor frames (either I or P) frames.
Under the MPEG-2 standard, a video frame may be divided into various subcomponents so that the picture represented by the frame can be processed as a plurality of smaller portions. These subcomponents are classified as blocks, macroblocks, and slices. A macroblock is made of a 16 pixel by 16 line section of luminance components and two spatially corresponding 8 pixel by 8 line sections (known as "blocks"), one for each chrominance component. In a macroblock, every luminance component of every pixel in the horizontal direction and every pixel in the vertical direction is represented. Only every other chrominance component (both vertically and horizontally) is similarly represented. As mentioned previously, MPEG-2 requires only one chrominance pair for every four luminance components for the same total area.
For each macroblock, the encoder chooses one of the intraframe or interframe coding modes. The coding mode chosen depends on the picture type, the effectiveness of motion compensation in the particular region of the picture, and the nature of the signal within the block. For intraframe encoding, only spatial redundancies occurring in the same frame are exploited. For interframe encoding, encoder estimates the motion vectors for each 16.times.16 macroblock in the video frame. Motion vectors give the displacement of each macroblock of pixels from frame to frame, exploiting the temporal redundancy found in the frames. For example, P frames are predicted from a single prior frame. B frames are predicted from one or both of a prior frame and a subsequent frame. A typical coding scheme contains a mixture of I-, P-, and B-pictures. An illustrative encoder might generate an I-picture every half a second, to give reasonably fast random access, with two B-pictures inserted between each pair of I- or P-pictures.
After the encoder performs the motion-compensated prediction of the macroblock contents for P and B frames, the encoder then produces an error signal by subtracting the prediction from the actual data in the current macroblock. The error signal is separated into 8.times.8 blocks (four luminance blocks and two chrominance blocks), and a DCT (discrete cosine transform) is performed on each 8.times.8 block.
The DCT operation converts the 8.times.8 block of pixel values into spatial frequency coefficients. The resulting DCT coefficients are then quantized to achieve compression thereof. Statistical encoding of the spatial coefficients takes advantage of the non-uniform distribution of DCT coefficients. Run length coding takes advantage of the long strings of coefficients with a magnitude of zero to achieve coding gain. In addition, variable length encoding of the coefficients assigns shorter codewords to frequent events and longer codewords to less frequent events, thereby achieving further video compression.
To accomplish transmission, the video information of a video frame is transmitted as a sequence of macroblock units. The macroblocks are transmitted in a video bitstream, with the beginning of one macroblock following the end of the previous one. The data representing a horizontal series of encoded macroblocks may be grouped together with a data header into what is referred to as "slices". The data representative of several video frames may also be grouped together with a data header. This is typically referred to as a "Group of Pictures" (also referred to as "GOP").
In general, the video bitstream can be thought of as a hierarchy of data structures containing one or more subordinate structures. For instance, an MPEG-2 structure referred to as "picture.sub.-- data" contains one or more slices which contains one or more macroblocks. The highest data structure of the coded video bitstream is the video sequence. A video sequence begins with a sequence header which may be followed by a group of pictures header and then by one or more coded frames.
During the transmission of digital video signals, data other than the encoded video data must be transmitted within the video bitstream. Control or processing data, error correcting data, coded picture data, synchronization sequences and other information necessary to receive and process the digital video information must be communicated. The additional data is essential for reconstruction of the video sequence, and to permit the video data to survive the transmission errors that may occur, especially if the data is transmitted by terrestrial broadcasting. Compressed data are, in general, highly vulnerable to bit errors that result from noise in the transmission channel.
After the transmitted bitstream reaches a receiver, the control data is separated from the video data, which is decoded for display. The decoding process is essentially the reverse of the encoding process described above, that is, the variable length encoded data is decoded to yield the quantized data. This data is then dequantized to yield two-dimensional DCT coefficients. An inverse DCT ("IDCT") transform operation is performed on the data, and the data is rearranged and synthesized to obtain the picture data. When assembled, a complete picture frame is obtained. The output of the decoding process is a series of fields or frames that are normally the input of a display device for displaying the decoded bit stream.
For display, MPEG-2 allows a picture acquisition mode known as progressive refresh. Progressive refresh refers to any strategy other than providing complete I-frames. During progressive refresh, one or more macroblocks in each frame is represented with intraframe coding. Successive frames include different macroblocks represented with intracoding and macroblocks based on predictions from the earlier intracoded macroblocks. The intracoded macroblocks in each frame are chosen such that a complete reference picture can be constructed by a decoder after some number of frames.
Current video decoders, when initially presented with a coded video bitstream which uses progressive refresh, behave in one of the following two ways:
In a first technique, decoding starts from the first received frame, with the video picture being reconstructed as it arrives. Initially only a stored picture or synthetic pattern can be displayed. After a sufficient number of frames have been received so that the decoder is likely to have a complete reference picture, it displays the video pictures. However, one problem with this technique is that, in a typical MPEG-2 system, the decoder does not know a priori how many frames it will take to recover a complete reference picture. Another problem is that there is an added time delay from the selection of a new program, or the recovery of a data loss dropout, to when the viewer begins to see the program video.
A second technique allows the viewer to see video pictures as soon as possible. However, the problem with this second technique is that the viewer will see random picture data in areas that have been coded predictively but for which an accurate reference prediction has not yet been received.
Thus, there is a need for an encoder/decoder transmission system that can accurately signal the decoder as to the number of frames which need to be decoded before a complete reference picture is obtained. In addition, there is a need for a video encoder/decoder transmission system that can determine the regions of a picture frame which have been decoded for use as prediction references, so that concealment techniques can be accurately employed in the uninitialized data regions of the picture frame to provide for a more pleasing picture.
Accordingly, it is an object of present invention to provide a video encoder/decoder transmission system that can signal a video decoder as to the number of picture frames which need to be decoded during a progressive refresh operation before a complete reference picture can be decoded.
It is a further object of the present invention to provide a video encoder/decoder transmission system that can inhibit the display of picture frames until a reference picture is established, to avoid the display of frames containing unsuitable amounts of uninitialized video data.
It is another object of the present invention to provide a video encoder/decoder transmission system that can accurately record the regions of a picture which have been decoded into useful prediction references, so that the remaining data regions can be masked through concealment techniques.
It is yet another object of the present invention to provide a video encoder/decoder transmission system that provides for a faster presentation of more visually pleasing video pictures at the initial display of a new program or after a data loss dropout.
The foregoing objects and advantages of the invention are illustrative of those which can be achieved by the present invention and are not intended to be exhaustive or limiting of the possible advantages which can be realized. Thus, these and other objects and advantages of the invention will be apparent from the description herein or can be learned from practicing the invention, both as embodied herein or as modified in view of any variations which may be apparent to those skilled in the art. Accordingly, the present invention resides in the novel methods, arrangements, combinations and improvements herein shown and described.