Digital multimedia data such as video and music can be transmitted wirelessly to mobile receivers, such as wireless telephones, for playing of the multimedia by users of the mobile receivers. Such data typically may be broadcast.
The multimedia can be formatted in accordance with Moving Pictures Expert Group (MPEG) standards such as MPEG-1, MPEG-2 (also used for DVD format), MPEG-4 and other block based transform codecs. Essentially, for individual video frames these multimedia standards use Joint Photographic Experts Group (JPEG) compression. In JPEG, the image of a single frame is typically divided into small blocks of pixels (usually 8×8 and/or 16×16 pixel blocks) that are encoded using a discrete cosine transform (DCT) function to transform the spatial intensity values represented by the pixels to spatial frequency values, roughly arranged, in a block, from lowest frequency to highest. Then, the DCT values are quantized, i.e., the information is reduced by grouping it into chunks by, e.g., dividing every value by 10 and rounding off to the nearest integer. Since the DCT function includes a progressive weighting that puts bigger numbers near the top left corner of a block and smaller numbers near the lower right corner, a special zigzag ordering of values can be applied that facilitates further compression by run-length coding (essentially, storing a count of the number of, e.g., zero values that appear consecutively, instead of storing all the zero values). If desired, the resulting numbers may be used to look up symbols from a table developed using Huffman coding to create shorter symbols for the most common numbers, an operation commonly referred to as “variable length coding”. In any case, a JPEG-encoded stream represents horizontal lines of a picture, in much the same way as the underlying pixel data is arranged in a matrix of horizontal rows.
It will be appreciated that JPEG compression results in lost information. However, owing to the phenomenon of human perception and the way that the above process works, JPEG compression can reduce a picture to about one-fifth of its original size with virtually no discernable difference and to one-tenth of its original size with only slight degradation.
Motion pictures add a temporal dimension to the spatial dimension of single pictures. Typical motion pictures have thirty frames, i.e., thirty still pictures, per second of viewing time. MPEG is essentially a compression technique that uses motion estimation to further compress a video stream.
MPEG encoding breaks each picture into blocks called “macroblocks”, and then searches neighboring pictures for similar blocks. If a match is found, instead of storing all of the DCT values for the entire block, the system stores a much smaller vector that describes the movement (or not) of the block between pictures. In this way, efficient compression is achieved.
With more specificity, MPEG compression in general uses three kinds of video frames. Naturally, some frames, referred to as “intraframes” (also referred to as “reference frames”, or “I frames” and “information frames”), in which the entire frame is composed of compressed, quantized DCT values, must be provided (e.g., around two per second). But in MPEG compression the remaining frames (e.g., 28) that make up the rest of the video for that second are very much smaller frames that refer to the intraframes, in accordance with MPEG compression principles. In MPEG parlance these frames are called “predicted” frames (“P frames”) and “bidirectional” frames (“B frames”), herein collectively referred to as “interframes”.
Predicted frames are those frames that contain motion vector references to the preceding intraframe or to a preceding predicted frame, in accordance with the discussion above. If a block has changed slightly in intensity or color, then the difference between the two frames is also encoded in a predicted frame. Moreover, if something entirely new appears that does not match any previous blocks, then a new block or blocks can be stored in the predicted frame in the same way as in an intraframe. Note that, as used herein, such a new block is not a “predetermined portion” of an intraframe in that it arises only upon the random introduction of a new object of arbitrary size and position in the frame.
In contrast, a bidirectional frame is used as follows. The MPEG system searches forward and backward through the video stream to match blocks (typically one frame in each direction). Experience has shown that two bidirectional frames between each intraframe or predictive frame works well, so that a typical group of frames associated with a single intraframe might be: the full intraframe, followed by two bidirectional frames, followed by a predictive frame, followed by two bidirectional frames, another predictive frame, two more bidirectional frames, a predictive frame, two more bidirectional frames, a predictive frame, and finally two more bidirectional frames, at which point a new full intraframe might be placed in the stream to refresh the stream. In some instances, only intraframes and predictive frames are used, since bidirectional frames are computationally expensive to generate and they require more reference video frames be store in the decoder memory. The simplest encoders do not use any interframes at all, but only intraframes, dramatically sacrificing compression for simplicity but using the least amount of decoder memory.
The present invention, in contemplating the above principles, recognizes that MPEG compression works very well when transmitting a video stream over a reliable link (e.g., from a hard disk drive or DVD to a processor over a reliable TCP/IP network connection). The present invention has critically recognized, however, that in the context of “lossy” transmission paths such as might occur in wireless transmission, the loss of an intraframe more or less destroys the associated interframes and thus severely degrades the quality of service (QOS) provided, until the occurrence of the next full intraframe, particularly when a lost intraframe cannot be retransmitted (e.g., during broadcast transmission). This often requires the decoder to freeze the display until another valid intraframe is received. The ideal solution is to provide the compression obtained by using P and B frames with improved error resilience.