FIG. 1 is a video image 6, which is a video frame that includes a first region 7 and a second region 8. Although described as a video frame for example purposes, the image 6 may also be a video field. Furthermore, although shown as two rectangles in a top-bottom arrangement, the number, shape, and respective locations of these regions is arbitrary.
Typically, one views a sequence of video frames 6 in their respective entireties. But one may sometimes wish to view another image, i.e., an overlay image, in one of the regions 7 and 8. For example, one may wish to view an electronic program guide (EPG) in the region 8 while he/she is watching a program in the region 7 (and also in the region 8 if the EPG is transparent). Or, one may wish to view an internet order menu in the region 8 while he/she is viewing merchandise for sale in the region 7 (and also in the region 8 if the menu is transparent). Thus, the overlay image is typically a partial frame that is the same size as or that is smaller than the frame region that it overlays, although the overlay frame can overlay an entire video frame. But for simplicity, both partial and full overlay frames are referred to as “overlay frames”.
FIG. 2 is a block diagram of a conventional television receiver system 10, which includes a set-top box 11 such as a cable TV (CATV) or satellite TV box, a remote control 12, and a digital video display 13. Generally, the box 11 allows one to view overlay images—such as those that compose an EPG—in the respective regions 7 of a sequence of video frames 6 (FIG. 1). The box 11 includes a processing circuit 14, which receives an encoded, multiplexed broadcast video signal on an input terminal 15, receives command signals from the remote control 12 on a command terminal 16, and generates a video display signal on an output terminal 17. The broadcast video signal includes one or more broadcast channels and one or more overlay frames such as the frames that compose an EPG, and is encoded according to a compression standard such as the Moving Pictures Experts Group (MPEG) standard (discussed below). In response to channel-select and overlay commands from the remote control 12, the circuit 14 blends the video frames from the selected channel with the appropriate overlay frame or frames and generates the display signal as a sequence of these blended video frames. The display 13 receives the display signal from the terminal 17 and decodes and displays the sequence of blended video frames.
More specifically, the processing circuit 14 includes a command decoder 18, which decodes the commands from the remote control 12 and generates corresponding control signals, such as an overlay signal, that control other portions of the processing circuit 14. A channel selector 20 receives the broadcast signal from the terminal 15 and, in response to a channel-select signal from the command decoder 18, demultiplexes the selected channel from the broadcast signal. In response to an overlay signal from the decoder 18, the selector 20 also demultiplexes the selected overlay frames from the broadcast signal. For example, the selector 20 may demultiplex the EPG that corresponds to the selected channel. A video decoder 22 decodes the video frames of the selected channel into pixel-domain frames, i.e., frames of pixel luminance and chromanance values. In response to the overlay signal, the video decoder 22 also decodes the selected overlay frames into the pixel domain, and an overlay/video combiner 24 blends the decoded video frames with the decoded overlay frames. Conversely, if the command decoder 18 does not generate an overlay signal, then the selector 20 does not demultiplex the overlay frames, and thus the combiner 24 merely passes through the decoded video frames from the decoder 22. In one embodiment, the output terminal of the combiner 24 is coupled directly to the output terminal 17. But because it is sometimes undesirable to couple decoded video frames (blended or unblended) directly to the display 13, in another embodiment the circuit 14 includes an optional re-encoder 26, which re-encodes the decoded video frames from the combiner 24 before providing them to the display 13. Although shown as including a number of separate circuit blocks, the processing circuit 14 may include one or more processors that perform the functions of the above-described circuit blocks 18, 20, 22, 24, and 26.
Still referring to FIG. 2, in operation during a period when a viewer does not want to view an overlay frame, he selects a channel with the remote control 12, which generates a corresponding control signal. The control terminal 16, which is typically an infrared detector, receives the control signal and couples it to the command decoder 18. In response to the control signal, the decoder 18 generates the channel-select signal, which causes the channel selector 20 to recover the encoded video signal of the selected channel by demultiplexing the broadcast signal. The video decoder 22 decodes the recovered video signal into frames of pixel values, and the combiner 24 passes these frames to the optional re-encoder 26, which re-encodes the frames and provides a re-encoded video signal to the display 13. If, however, the re-encoder 26 is omitted, then the combiner 24 passes the decoded frames directly to the display 13.
In operation during a period when the viewer wants to view an overlay frame, he selects a channel as described above and also selects an overlay frame or a series of overlay frames, such as an EPG, with the remote control 12. The decoder 18 generates the channel-select signal and an overlay signal, which together cause the channel selector 20 to recover both the encoded video signal of the selected channel and the encoded video signal containing the overlay frame or frames. The overlay signal causes the video decoder 22 to decode the recovered channel and overlay video signals from the channel selector 20 into respective sequences of frames, and causes the combiner 24 to blend the overlay frames with the channel frames to generate blended frames. The optional re-encoder 26 re-encodes these blended frames and provides them to the display 13, which decodes the re-encoded blended frames. If, however, the re-encoder 26 is omitted, then the combiner 24 provides the blended frames directly to the display 13.
Unfortunately, the set-top box 11 cannot utilize the decoding ability of the display 13, and thus includes its own redundant decoding circuitry, which often adds significant size and cost to the box 11. Typically, the display 13 includes channel-select and full decoding circuitry respectively similar to the channel selector 20 and the decoder 22 of the box 11. Thus, the display 13 typically can directly receive the encoded, multiplexed broadcast video signal, recover the encoded video signal of the selected channel, and decode and display the video frames of the recovered video signal. But the display 13 typically cannot blend overlay frames with the video frames. Therefore, to allow such blending, the box 11 includes the same decoding capability (the decoder 22) as the display 13. The viewer, however, typically requests the display of overlay frames for only a small portion of the time that he/she spends watching a program. Therefore, because the blending abilities of the box 11 are needed only a small part of the time, the decoding abilities of the box 11 are redundant to those of the display 13 most of the time. That is, the viewer paid for two full decoders when one decoder will do the job the vast majority of the time! Furthermore, where it is desired to provide the display 13 with an encoded video signal, the processing circuitry also includes the re-encoder 26, which adds even more size and expense to the box 11!
To help the reader more easily understand the concepts discussed below in the description of the invention, following is a basic overview of conventional video-compression techniques.
To electronically transmit a relatively high-resolution image over a relatively low-band-width channel, or to electronically store such an image in a relatively small memory space, it is often necessary to compress the digital data that represents the image. Such image compression typically involves reducing the number of data bits necessary to represent an image. For example, High-Definition-Television (HDTV) video images are compressed to allow their transmission over existing television channels. Without compression, HDTV video images would require transmission channels having bandwidths much greater than the bandwidths of existing television channels. Furthermore, to reduce data traffic and transmission time to acceptable levels, an image may be compressed before being sent over the internet. Or, to increase the image-storage capacity of a CD-ROM or server, an image may be compressed before being stored thereon.
Referring to FIGS. 3–6, the basics of the popular block-based Moving Pictures Experts Group (MPEG) compression standards, which include MPEG-1 and MPEG-2, are discussed. For purposes of illustration, the discussion is based on using an MPEG 4:2:0 format to compress video images represented in a Y, CB, CR color space. However, the discussed concepts also apply to other MPEG formats, to images that are represented in other color spaces, and to other block-based compression standards such as the Joint Photographic Experts Group (JPEG) standard, which is often used to compress still images. Furthermore, although many details of the MPEG standards and the Y, CB, CR color space are omitted for brevity, these details are well-known and are disclosed in a large number of available references.
Still referring to FIGS. 3–6, the MPEG standards are often used to compress temporal sequences of images—video frames for purposes of this discussion—such as found in a television broadcast. Each video frame is divided into subregions called macro blocks, which each include one or more pixels. FIG. 3A is a 16-pixel-by-16-pixel macro block 30 having 256 pixels 32 (not drawn to scale). In the MPEG standards, a macro block is always 16×16 pixels, although other compression standards may use macro blocks having other dimensions. In the original video frame, i.e., the frame before compression, each pixel 32 has a respective luminance value Y and a respective pair of color-, i.e., chroma-, difference values CB and CR.
Referring to FIGS. 3A–3D, before compression of the frame, the digital luminance (Y) and chroma-difference (CB and CR) values that will be used for compression, i.e., the pre-compression values, are generated from the original Y, CB, and CR values of the original frame. In the MPEG 4:2:0 format, the pre-compression Y values are the same as the original Y values. Thus, each pixel 32 merely retains its original luminance value Y. But to reduce the amount of data to be compressed, the MPEG 4:2:0 format allows only one pre-compression CB value and one pre-compression CR value for each group 34 of four pixels 32. Each of these pre-compression CB and CR values are respectively derived from the original CB and CR values of the four pixels 32 in the respective group 34. For example, a pre-compression CB value may equal the average of the original CB values of the four pixels 32 in the respective group 34. Thus, referring to FIGS. 3B–3D, the pre-compression Y, CB, and CR values generated for the macro block 10 are arranged as one 16×16 matrix 36 of pre-compression Y values (equal to the original Y value for each pixel 32), one 8×8 matrix 38 of pre-compression CB values (equal to one derived CB value for each group 34 of four pixels 32), and one 8×8 matrix 40 of pre-compression CR values (equal to one derived CR value for each group 34 of four pixels 32). The matrices 36, 38, and 40 are often called “blocks” of values. Furthermore, because it is convenient to perform the compression transforms on 8×8 blocks of pixel values instead of 16×16 blocks, the block 36 of pre-compression Y values is subdivided into four 8×8 blocks 42a–42d, which respectively correspond to the 8×8 blocks A–D of pixels in the macro block 30. Thus, referring to FIGS. 3A–3D, six 8×8 blocks of pre-compression pixel data are generated for each macro block 30: four 8×8 blocks 42a–42d of pre-compression Y values, one 8×8 block 38 of pre-compression CB values, and one 8×8 block 40 of pre-compression CR values.
FIG. 4 is a block diagram of an MPEG compressor 50, which is more commonly called an encoder. Generally, the encoder 50 converts the pre-compression data for a frame or sequence of frames into encoded data that represent the same frame or frames with significantly fewer data bits than the pre-compression data. To perform this conversion, the encoder 50 reduces or eliminates redundancies in the pre-compression data and reformats the remaining data using efficient transform and coding techniques.
More specifically, the encoder 50 includes a frame-reorder buffer 52, which receives the pre-compression data for a sequence of one or more frames and reorders the frames in an appropriate sequence for encoding. Thus, the reordered sequence is often different than the sequence in which the frames are generated and will be displayed. The encoder 50 assigns each of the stored frames to a respective group, called a Group Of Pictures (GOP), and labels each frame as either an intra (I) frame or a non-intra (non-I) frame. For example, each GOP may include three I frames and 12 non-I frames for a total of fifteen frames. The encoder 50 always encodes an I frame without reference to another frame, but can and often does encode a non-I frame with reference to one or more of the other frames in the GOP. The encoder 50 does not, however, encode a non-I frame with reference to a frame in a different GOP.
During the encoding of an I frame, the 8×8 blocks (FIGS. 3B–3D) of the pre-compression Y, CB, and CR values that represent the I frame pass through a summer 54 to a Discrete Cosine Transformer (DCT) 56, which transforms these blocks of values into respective 8×8 blocks of one DC (zero frequency) coefficient and sixty-three AC (non-zero frequency) coefficients. That is, the summer 54 is not needed when the encoder 50 encodes an I frame, and thus the pre-compression values pass through the summer 54 without being summed with any other values. As discussed below, however, the summer 54 is often needed when the encoder 50 encodes a non-I frame. A quantizer 58 limits each of the coefficients to a respective maximum value, and provides the quantized AC and DC coefficients on respective paths 60 and 62. A prediction encoder 64 predictively encodes the DC coefficients, and a variable-length coder 66 converts the quantized AC coefficients and the quantized and predictively encoded DC coefficients into variable-length codes, such as Huffman codes. These codes form the encoded data that represent the pixel values of the encoded I frame. A transmit buffer 68 then temporarily stores these codes to allow synchronized transmission of the encoded data to a decoder (discussed below in conjunction with FIG. 6). Alternatively, if the encoded data is to be stored instead of transmitted, the coder 66 may provide the variable-length codes directly to a storage medium such as a CD-ROM.
If the I frame will be used as a reference (as it often will be) for one or more non-I frames in the GOP, then, for the following reasons, the encoder 50 generates a corresponding reference frame by decoding the encoded I frame with a decoding technique that is similar or identical to the decoding technique used by the decoder (FIG. 6). When decoding non-I frames that are referenced to the I frame, the decoder has no option but to use the decoded I frame as a reference frame. Because MPEG encoding and decoding are lossy—some information is lost due to quantization of the AC and DC transform coefficients—the pixel values of the decoded I frame will often be different than the pre-compression pixel values of the I frame. Therefore, using the pre-compression I frame as a reference frame during encoding may cause additional artifacts in the decoded non-I frame because the reference frame used for decoding (decoded I frame) would be different than the reference frame used for encoding (pre-compression I frame).
Therefore, to generate a reference frame for the encoder that will be similar to or the same as the reference frame for the decoder, the encoder 50 includes a dequantizer 70 and an inverse DCT 72, which are designed to mimic the dequantizer and inverse DCT of the decoder (FIG. 6). The dequantizer 70 dequantizes the quantized DCT coefficients from the quantizer 58, and the inverse DCT 72 transforms these dequantized DCT coefficients into corresponding 8×8 blocks of decoded Y, CB, and CR pixel values, which compose the reference frame. Because of the losses incurred during quantization, however, some or all of these decoded pixel values may be different than their corresponding pre-compression pixel values, and thus the reference frame may be different than its corresponding pre-compression frame as discussed above. The decoded pixel values then pass through a summer 74 (used when generating a reference frame from a non-I frame as discussed below) to a reference-frame buffer 76, which stores the reference frame.
During the encoding of a non-I frame, the encoder 50 initially encodes each macro-block of the non-I frame in at least two ways: in the manner discussed above for I frames, and using motion prediction, which is discussed below. The encoder 50 then saves and transmits the resulting code having the fewest bits. This technique insures that the macro blocks of the non-I frames are encoded using the fewest bits.
With respect to motion prediction, an object in a frame exhibits motion if its relative position changes in the succeeding frames. For example, a horse exhibits relative motion if it gallops across the screen. Or, if the camera follows the horse, then the background exhibits relative motion with respect to the horse. Generally, each of the succeeding frames in which the object appears contains at least some of the same macro blocks of pixels as the preceding frames. But such matching macro blocks in a succeeding frame often occupy respective frame locations that are different than the respective frame locations they occupy in the preceding frames. Alternatively, a macro block that includes a portion of a stationary object (e.g., tree) or background scene (e.g., sky) may occupy the same frame location in each of a succession of frames, and thus exhibit “zero motion”. In either case, instead of encoding each frame independently, it takes fewer data bits to tell the decoder “the macro blocks R and Z of frame 1 (non-I frame) are the same as the macro blocks that are in the locations S and T, respectively, of frame 0 (I frame).” This “statement” is encoded as a motion vector. For a relatively fast moving object, the location values of the motion vectors are relatively large. Conversely, for a stationary or relatively slow-moving object or background scene, the location values of the motion vectors are relatively small or equal to zero.
FIG. 5 illustrates the concept of motion vectors with reference to the non-I frame 1 and the I frame 0 discussed above. A motion vector MVR indicates that a match for the macro block in the location R of frame 1 can be found in the location S of frame 0. MVR has three components. The first component, here 0, indicates the frame (here frame 0) in which the matching macro block can be found. The next two components, XR and YR, together comprise the two-dimensional location value that indicates where in the frame 0 the matching macro block can be found. Thus, in this example, because the location S of the frame 0 has the same X,Y coordinates as the location R in the frame 1, XR=YR=0. Conversely, the macro block in the location T matches the macro block in the location Z, which has different X,Y coordinates than the location T. Therefore, XZ and YZ represent the location T with respect to the location Z. For example, suppose that the location T is ten pixels to the left of (negative X direction) and seven pixels down from (negative Y direction) the location Z. Therefore, MVZ=(0, −10, −7). Although there are many other motion-vector schemes available, they are all based on the same general concept.
Referring again to FIG. 4, motion prediction is now discussed in detail. During the encoding of a non-I frame, a motion predictor 78 compares the pre-compression Y values (the CB and CR values are not used during motion prediction) of the macro blocks in the non-I frame to the decoded Y values of the respective a macro blocks in the reference frame and identifies matching macro blocks. For each macro block in the non-I frame for which a match is found in the reference frame, the motion predictor 78 generates a motion vector that identifies the reference frame and the location of the matching macro block within the reference frame. Thus, as discussed below in conjunction with FIG. 6, during decoding of these motion-encoded macro blocks of the non-I frame, the decoder uses the motion vectors to obtain the pixel values of the motion-encoded macro blocks from the matching macro blocks in the reference frame. The prediction encoder 64 predictively encodes the motion vectors, and the coder 66 generates respective codes for the encoded motion vectors and provides these codes to the transmit buffer 48.
Furthermore, because a macro block in the non-I frame and a matching macro block in the reference frame are often similar but not identical, the encoder 50 encodes these differences along the with motion vector so that the decoder can account for them. More specifically, the motion predictor 78 provides the decoded Y values of the matching macro block of the reference frame to the summer 54, which effectively subtracts, on a pixel-by-pixel basis, these Y values from the pre-compression Y values of the matching macro block of the non-I frame. These differences, which are called residuals, are arranged in 8×8 blocks and are processed by the DCT 56, the quantizer 58, the coder 66, and the buffer 68 in a manner similar to that discussed above, except that the quantized DC coefficients of the residual blocks are coupled directly to the coder 66 via the line 60, and thus are not predictively encoded by the prediction encoder 44.
Additionally, it is possible to use a non-I frame as a reference frame. When a non-I frame will used as a reference frame, the quantized residuals from the quantizer 58 are respectively dequantized and inverse transformed by the dequantizer 70 and the inverse DCT 72 so that this non-I reference frame will be the same as the one used by the decoder for the reasons discussed above. The motion predictor 78 provides to the summer 74 the decoded Y values of the I reference frame from which the residuals were generated. The summer 74 adds the respective residuals from the circuit 72 to these decoded Y values of the I reference frame to generate the respective Y values of the non-I reference frame. The reference frame buffer 76 then stores the non-I reference frame along with the I reference frame for use in encoding subsequent non-I frames.
Still referring to FIG. 4, the encoder 50 also includes a rate controller 80 to insure that the transmit buffer 68, which typically transmits the encoded frame data at a fixed rate, never overflows or empties, i.e., underflows. If either of these conditions occurs, errors may be introduced into the encoded data stream. For example, if the buffer 68 overflows, data from the coder 66 is lost. Thus, the rate controller 80 uses feed back to adjust the quantization scaling factors used by the quantizer 58 based on the degree of fullness of the transmit buffer 68. The fuller the buffer 68, the larger the controller 80 makes the scale factors, and the fewer data bits the coder 66 generates. Conversely, the more empty the buffer 68, the smaller the controller 80 makes the scale factors, and the more data bits the coder 66 generates. This continuous adjustment insures that the buffer 68 neither overflows nor underflows.
FIG. 6 is a block diagram of a conventional MPEG decompresser 82, which is commonly called a decoder and which can decode frames that are encoded by the encoder 60 of FIG. 4.
For I frames and macro blocks of non-I frames that are not motion predicted, a variable-length decoder 84 decodes the variable-length codes received from the encoder 50. A prediction decoder 86 decodes the predictively encoded DC coefficients, and a dequantizer 87, which is similar or identical to the dequantizer 70 of FIG. 4, dequantizes the decoded AC and DC transform coefficients. An inverse DCT 88, which is similar or identical to the inverse DCT 72 of FIG. 4, transforms the dequantized coefficients into pixel values. The decoded pixel values pass through a summer 90—which is used during the decoding of motion-predicted macro blocks of non-I frames as discussed below—into a frame-reorder buffer 92, which stores the decoded frames and arranges them in a proper order for display on a video display unit 94. If a decoded I frame is used as a reference frame, it is also stored in the reference-frame buffer 96.
For motion-predicted macro blocks of non-I frames, the decoder 84, dequantizer 87, and inverse DCT 88 process the residuals as discussed above in conjunction with FIG. 4. The prediction decoder 86 decodes the motion vectors, and a motion interpolator 98 provides to the summer 90 the pixel values from the reference-frame macro blocks that the motion vectors point to. The summer 90 adds these reference pixel values to the residuals to generate the pixel values of the decoded macro blocks, and provides these decoded pixel values to the frame-reorder buffer 92. If a decoded non-I frame is used as a reference frame, it is stored in the reference-frame buffer 96.
Referring to FIGS. 4 and 6, although described as including multiple functional circuit blocks, the encoder 50 and the decoder 82 may be implemented in hardware, software, or a combination of both. For example, the encoder 50 and the decoder 82 are often implemented by a respective one or more processors that perform the respective functions of the circuit blocks.
More detailed discussions of the MPEG encoder 50 and decoder 82 of FIGS. 4 and 6, respectively, and of the MPEG standard in general are available in many publications including “Video Compression” by Peter D. Symes, McGraw-Hill, 1998, which is incorporated by reference. Furthermore, there are other well-known block-based compression techniques for encoding and decoding images.