Digital data is typically transmitted from some type of transmitter to some type of receiver. Transmitters typically include an encoder that encodes the data for transmission; and receivers typically include a decoder that decodes data that it receives. There are different types of digital data such as video data, audio data, audio/video data and the like. When digital data is transmitted, it is typically transmitted in some type of channel.
The predominant video compression and transmission formats are from a family called hybrid block-based motion-compensated transform video coders, examples of which include the video coding standards of the ITU-T VCEG (Video Coding Experts Group) and ISO/IEC MPEG (Moving Picture Experts Group) organizations—including H.261, MPEG-1, H.262/MPEG-2 Video, H.263, MPEG-4 Visual, and the in-process draft standard H.264/AVC. Coding and compression standards have also been specified for many other types of media including still pictures, audio, documents, web pages, and such, and for multiplexing together and synchronizing such signals.
The most widely used video coding standard is H.262/MPEG-2 Video, which we will use as a reference example herein. Generally, an MPEG-2 video stream is composed of three types of frames or pictures. In this document, we use the term “picture”. The three types of MPEG-2 pictures are:                intra pictures (I-pictures);        predictive pictures (P-pictures); and        bi-directionally predictive pictures (B-pictures).        
An MPEG-2 video stream or sequence is typically defined by segments called Groups of Pictures (GOPs). Typically, a GOP consists of a set of pictures of ½ second duration when displayed at their intended speed.
FIG. 1 illustrates the beginning of an MPEG-2 video stream consisting of a sequence of pictures ordered and indexed from left to right in the order in which the pictures will be displayed and starting with an I-picture 100 (I0). In this example the first GOP starts with the first I-picture 100 (I0) and contains subsequent pictures up to and including the last P-picture 160 (P6) that precedes the next I-picture 190 (I9). The second GOP starts with the first B-picture 170 (B7) that precedes the second I-picture 190 (I9). The first example GOP in this example sequence includes one I-picture; two P-pictures; and four B-pictures. Each GOP includes one or more consecutive pictures beginning with an I-picture such as picture 100 (I0) that is not immediately-preceded by a B-picture or beginning with the first B-picture such as 170 (B7) in a sequence of one or more consecutive B-pictures that immediately precedes an I-picture, such as 190 (I9).
Decoding typically begins by decoding the first I-picture of any GOP, essentially independent of any preceding GOPs—for example, at I-picture 100 (I0) in the first GOP or at I-picture 190 (I9) in the second GOP. There is no specific limit to the number of pictures which may be in a GOP, nor is there a requirement for an equal number of pictures in all GOPs in a video sequence.
MPEG-2 I-pictures and P-pictures are called “anchor” pictures (or “key” pictures). An I-picture can be decoded independently of any other pictures. It does not rely on data from any other picture to construct its image. An MPEG-2 P-picture such as picture 130 (P3) requires data from one previously decompressed anchor picture (e.g., I-pictures or P-pictures) to enable its decompression. While it is dependent, it is only dependent on one anchor picture that has already been decoded.
An MPEG-2 B-picture such as picture 110 (B1) requires data from both preceding and succeeding anchor pictures (e.g., I-pictures or P-pictures) to be decoded. That is, an MPEG-2 B-picture is bi-directionally dependent.
In FIG. 1, the ends of the arrows indicate the picture(s) from which the arrow-pointed picture is dependent. For example, B-picture 140 (B4) is dependent upon P-picture 130 (P3) and P-picture 160 (P6).
Now consider the picture sequence of FIG. 2 which illustrates the display order of the same sequence of individual I-, B- and P-pictures, in which some additional pictures are shown after the last picture shown in FIG. 1 and in which the beginning of a third GOP is shown starting with picture B16. The display order is the order in which the pictures are to be displayed. So, for example, if one were displaying the individual pictures, I0 would be the first picture displayed, followed by B1, B2 and so on. Notice, however, that if one looks at the sequence from the standpoint of those pictures that are predicted (i.e. the B-pictures), in order to decode a B-picture, the decoder must refer to the decoded value of an I-picture or P-picture that follows after it in time. So in this example, in order to decode B2, the decoder will refer to both I0 and P3. That is, the decoder will have to decode both I0 and P3 in order to decode B2.
Accordingly, the encoder typically transmits the pictures in a different order than the display order, so that the decoder can decode them as it receives them. For example, FIG. 3 illustrates the transmission order or decoding order of the FIG. 2 sequence. Because the decoder has to decode I-pictures and P-pictures before the B-pictures that reference them, the I-pictures and P-pictures are sent before the B-pictures that reference them. Here, notice that P3 is sent before B1, and B2 that reference it. Thus, when the decoder receives the sequence in its transmission order, it can first decode I0 and then P3 which references I0. Next, because it has decoded both I0 and P3, the decoder can now decode B1 and B2. Once the decoder has decoded a sufficient number of pictures, it will rearrange the pictures into display order for displaying. In the example, the decoder can accomplish the rearrangement by having its decoding process for I-pictures and P-pictures lag one anchor picture behind its display process (allowing the display of I0 after the decoding of P3, the display of B1 immediately after decoding it, the display of B2 immediately after decoding it, the display of P3 after the decoding of P6, the display of B4 immediately after decoding it, the display of B5 immediately after decoding it, the display of P6 after the decoding of I9, the display of B7 immediately after decoding it, etc.).
Now consider the situation in which the FIG. 3 sequence is randomly accessed. For example, assume that a user is watching, on a digital television, a program that has been encoded as described above and suddenly changes the channel to another encoded program. If the decoder attempts to access the sequence at a B-picture, the decoder will be unable to decode the B-picture because it does not have the information that the B-picture refers to, e.g. the previous-in-time I-picture or P-picture. Similarly, if the decoder randomly accesses the sequence at a P-picture, it will be unable to decode it because it will not have the previous-in-time P- or I-picture to which it refers. Thus, in the encoding scheme described above, the decoder will not be able to properly decode pictures within the sequence until it detects an I-picture. So typically what the decoder will do is scan forward in the sequence until it locates an I-picture. Once it locates an I-picture, it can start decoding and decode the I-picture properly. Then, after detecting and decoding one additional subsequent anchor picture, it can start displaying good quality pictures from that point forward, as all pictures that follow that point in decoding order will be decodable and as it has fulfilled its degree of lag necessary to rearrange the pictures out of decoding order into display order.
For example, assume that in FIG. 3, the decoder randomly accesses the picture sequence at picture P3. At this point, because P3 depends on I0 and the decoder does not have I0, it will not be able to decode P3. The system then waits until it gets to the I9 picture which it knows it can decode. Following the I9 picture, the decoder receives the B7 and B8 pictures which depend on P6 and I9. Although the system does have I9, it does not have the P6 picture. That is, although the decoder received the P6 picture, this picture depends on P3 which, in turn depends on I0. Since the decoder was not able to decode P3, it will not be able to decode P6, B7 and B8.
The next picture the system receives is P12 which depends only on I9. Accordingly, the system can decode P12. Once the system decodes P12, it will then receive, and can decode, pictures B10 and B11. Accordingly, from this point onward in the transmission order, all of the pictures can be suitably decoded and displayed.
Accordingly, randomly accessing an encoded sequence, such as one encoded as described above, does not always provide instantly decodable and displayable pictures.
One of the mechanisms that MPEG-2 provides to assist in random accesses is known as a “closed GOP flag”. Specifically, on individual I-pictures, a closed GOP flag can be provided by the encoder that indicates whether any subsequent pictures in the transmission order (after the I-picture) refer to a picture previous to the I-picture. That is, if the closed GOP flag is “true”, then the GOP is closed and any B-pictures that follow the I-picture in decoding order (if there are any B pictures), do not refer to any pictures previous to the I-picture in decoding order. Thus, a closed GOP means that the GOP is self-contained and can be decoded by the decoder. Thus, even though one could, in the B-pictures, refer to a previous-in-order picture, the closed GOP flag indicates that this was not actually done when it was encoded. If, on the other hand, the closed GOP flag is false, this means that the GOP is not closed and tells the decoder that the B-pictures depend on pictures and data that the decoder does not have. Accordingly, the decoder knows that it cannot decode the next following B-pictures (any B-pictures that follow in decoding order prior to the next anchor picture). In this situation, the decoder would decode the I-picture, and then skip over the next B-pictures and decode the P-picture. After this point, the decoder has recovered and can start displaying video and decoding any subsequent B-pictures. In the example sequence shown in FIG. 3, the GOP that starts with picture I0 is closed, while the GOPs that start with pictures I9 and I18 may or may not be closed (depending on whether the B-pictures that immediately follow each of those I-pictures in decoding order use prediction from the picture previous to the I-picture).
As video coding and decoding standards evolve and grow more complex, continuing challenges are posed to provide and enhance functionalities, such as random accessibility, without degrading the user's experience. Accordingly, this invention arose out of concerns associated with providing improved methods and system for encoding and decoding digital data.