Traditionally, typical network computing environments include point-to-point communication between two computing devices. However, some communications technologies employ multi-point communications, where groups of computing devices simultaneously receive a common transmission.
Unicast and Multicast
There are at least two common approaches to simultaneously transmitting the same content to multiple computing devices over a network computing environment: unicast or multicast.
Unicast may be understood to be a communication that takes place over a network between a single sender and a single receiver. With unicast, a computing device generates and sends one set of data packets (each has common content) to each receiving computer. As the receiving group increases in size, unicast becomes increasingly less efficient because it is simultaneously transmitting copies of the same data packets to an increasing number of computing devices. In direct and geometric proportions, unicast requires increasingly more bandwidth as the receiving group increases. That is because the same information is carried multiple times-even on shared links.
Multicast may be understood to be a communication that transmits a single message to a select group of multiple recipients. In contrast to broadcasting, multicasting typically refers to sending a single message to a select group on a network rather than to everyone connected to the network.
With multicast, a computing device generates only one copy and sends it to the select group that chooses to receive it. This technique addresses packets to a group of receivers rather than to a single receiver. It typically depends on the network infrastructure to forward the packets to only the sub-networks and the receivers that need to receive them.
A common implementation of multicasting is Internet Protocol (IP) Multicast. It is a bandwidth-conserving technology that reduces traffic by simultaneously delivering a single stream (e.g., a media stream) of information to thousands of recipients. Typical applications that take advantage of multicast include videoconferencing, corporate communications, distance learning, and distribution of software, stock quotes, and news.
Media Streams
With the advent of digital media streaming technology (such as those using IP multicast), users are able to see and hear digital media, more or less, as the data is being received from a media server.
Herein, a “media stream” is a multimedia object (containing audio and/or visual content, such as a video) that is compressed and encoded in accordance with mechanisms generally available now or in the future for doing so. Furthermore, such a media stream is intended to be decoded and rendered in accordance with generally available mechanisms for doing so.
Without a loss of generality, the same techniques can be applied to any media stream that has a similar structure which reduces temporal, spatial, or perceptual redundancies. For example, many audio compression formats such as AC3 have keyframes followed by modification data to regenerate an approximation of the original uncompressed stream.
Multimedia Distribution Format Standards
Due to the amount of data required to accurately represent such multimedia content, it is typically delivered to the computing device in an encoded, compressed form. To reproduce the original content for presentation, the multimedia content is typically decompressed and decoded before it is presented.
A number of multimedia standards have been developed that define the format and meaning of encoded multimedia content for purposes of distribution. Organizations such as the Moving Picture Experts Group (MPEG) under the auspices of the International Standards Organization (ISO) and International Electrotechnical Commission (IEC), and the Video Coding Experts Group (VCEG) under the auspices of the International Telecommunications Union (ITU) have developed a number of multimedia coding standards (e.g., MPEG-1, MPEG-2, MPEG-4, H.261, H.263, and the like).
There are many different standardized video-stream data formats. For example: MPEG, H.263, MPEG-1, MPEG-2, MPEG-4 Visual, H.264/AVC, and DV formats. Likewise, there are many different standardized audio-stream data formats. For example: MPEG audio, AC3 audio, DTS audio, or MLP audio.
MPEG-2/H.262
The predominant digital video compression and transmission formats are from a family called block-based motion-compensated hybrid video coders, as typified by the ISO/IEC MPEG-x (Moving Picture Experts Group) and ITU-T VCEG H.26x (Video Coding Experts Group) standards. This family of standards is used for coding audio-visual information (e.g., movies, video, music, and such) in a digital compressed format.
For the convenience of explanation, the MPEG-2 video stream (also known as an H.262 video stream) is generally discussed and described herein, as it has a structure that is typical of conventional video coding approaches. However, those who are skilled in the art understand and appreciate that other such digital media compression and transmission formats exist and may be used.
An example representation of a MPEG-2 format is shown in FIG. 1. Each video sequence is composed of a sequence of frames that is typically called Groups of Pictures (or “GOP”), such as GOP 105. A GOP is composed of a sequence of pictures or frames. The GOP data is compressed as a sequence of I-, P- and B-frames where:                An I-frame (i.e., intra-frame) is an independent starting image—(compressed in a similar format to a JPEG image). An I-frame or “key frame” (such as I-frame 100t) is encoded as a single image, with no reference to any past or future frames. It is sometimes called a random access point (RAP).        A P-frame (i.e., forward predicted frame) is computed by moving around rectangles (called macroblocks) from the previous I- or P-frame then (if so indicated by the encoder) applying a ‘correction’ called a residual. Subsequent P-frames (such as P-frame 120t) are encoded relative to the past reference frame (such as a previous I- or P-frame).        Zero or more B-frames (i.e., bi-directional predicted frames, such as frames 130 and 132) are formed by a combination of rectangles from the adjacent I- or P-frames, followed (if so indicated by the encoder) by a correction residual.        
The GOP structure is intended to assist random access into the stream. A GOP is typically an independently decodable unit that may be of any size as long as it begins with an I-frame.
Transmission and Presentation Timelines
FIG. 1 illustrates two manifestations of the same MPEG-2 video stream. The first is the transmission timeline 100t and the other is the presentation timeline 100p. This is an example of transmission and presentation timelines of a typical media stream and their relationship to each other.
The transmission timeline 100t illustrates a media stream from the perspective of its transmission by a media-stream encoder and transmitter. Alternatively, it may be viewed from the perspective of the receiver of the transmission of the media stream.
As shown in FIG. 1, the I-frame (e.g., 110t) is typically temporally longer than the other frames in the transmission timeline. Since it doesn't utilize data from any other frame, it contains all of the data necessary to produce one complete image for presentation. Consequently, an I-frame includes more data than any of the other frames. Since the I-frame has more data than others, it follows that it typically requires greater time for transmission (and, of course, reception) than the other frame types.
FIG. 1 also shows P-frames (such as 120t) and B-frames (such as 130t and 132t) of the transmission timeline 100t. Relative to the B-frames, the P-frames are temporally longer in the transmission timeline because they typically include more data than the B-frames. However, P-frames are temporally shorter than I-frames because they include less data than I-frames. Since the B-frames rely on data from at least two other frames, they typically do not need as much data of their own to decode their image as do P-frames (which rely on one other frame).
FIG. 1 also illustrates the presentation timeline 100p of the media stream from the perspective of its presentation by the media decoder and presenter. In contrast to their transmission duration, the presentation duration of each frame—regardless of type—is exactly the same. In other words, it displays at a fixed frequency.
The incoming frames of the media stream are decoded, buffered, and then presented at a fixed frequency (e.g., 24 frames per second (fps)) to produce a relatively smooth motion picture presentation to the user. In MPEG 2 used to convey NTSC video, the field rate is fixed, and each MPEG 2 picture may produce 1, 2, or 3 fields. Field pictures are required to produce 1 field, and frame pictures may produce 2 or 3 fields. Thus, the frame picture presentation rate may not be fixed, but it is not dictated by the transmission rate of the frame pictures.
FIG. 1 also illustrates a typical decoded GOP 105 of MPEG in its presentation timeline. This GOP example includes an I-frame 110p; six P-frames (e.g., 120p); and 14 B-frames (e.g., 130p and 132p). Typically, each GOP includes a series of consecutively presented decoded frames that begin with an I-frame (such as frame 110p).
GOP Presentation Delay
FIG. 1 shows that the I-frame 110t of an example GOP is first received beginning at point T1 in time; however, it is not first presented until point T2. The time gap between the two points is called herein the “GOP presentation delay” and is labeled 170 in FIG. 1. It represents the delay from when the receiver first begins receiving the first frame of a GOP (which is typically the I-frame) until the device first presents the first frame of the GOP.
Media-Stream Presentation Start-up Delay
To tune channels in a media-streaming environment (such as in a multicast environment), a receiver requests a target channel. It receives the target media stream and then waits for an access point into the stream. A channel change cannot occur until an access point is received. From the perspective of the user, this can lead to lengthy channel change times.
FIG. 2 illustrates an example of a media-stream presentation start-up delay at 280. The start-up delay is the effective delay experienced by a user. It includes a delay between when a particular media stream is requested and when the first frame of a GOP from the particular media stream is actually presented. As shown in FIG. 2, the start-up delay 280 includes the GOP presentation delay 270 (discussed above).
Referring to FIG. 2, this example is explained. A GOP, starting with I-frame 210t, is being transmitted. This is shown in the transmission timeline 200t. The receiver seeks to tune into this media stream at request point R. This selection is illustrated as a user selecting a media-stream channel using a remote control 260.
Again, this is an example illustration for explanatory purpose. This point R could be at any moment in time after the beginning (i.e., after the beginning of its I-frame 210t) of a GOP.
The receiver must wait for a random access point (or RAP) in order to access the media stream. In this example, each GOP has one RAP. An I-frame is an example of a typical RAP. Therefore, each GOP has one I-frame. So, the receiver must wait for the next I-frame (at the beginning of the next GOP) before it can access the media-stream transmission as shown by transmission timeline 200t. 
Once the receiver has an I-frame in its buffer, it may refer back to it for dependency decoding of P- and B-frames. Consequently, a conventional system must wait for a RAP before it can start buffering frames (that are useful).
In FIG. 2, the receiver starts buffering the next GOP at point M1 with I-frame 250t. Thus, the first frame that may be eventually presented to the user is I-frame 250t, because it is the first RAP in the stream after the point at which the receiver joined the stream. Because of the GOP presentation delay (discussed above), it actually starts presenting the GOP (with I-frame 250p of presentation timeline 200p) at point M2—which is also the presentation start-up point S of the start-up delay 280.
As demonstrated by the screens 262-266, the start-up delay is the effective delay experienced by a user. The user selects a media-stream channel at request point R (using, for example, a remote 260) and sees a blank screen, as shown by screen 262. Of course, there may be information presented here (such as electronic programming information), but, since it is not yet the desired media-stream content, it is effectively blank.
Screen 264 shows that screen remains blank even after the next GOP is currently being received. Screen 266 shows that the first image of frame 250p is finally presented to the user.
The average length of this start-up delay is directly affected by the average GOP length. Some media-stream providers employ relatively long average GOP lengths. In those instances, this delay is even more acute because, when changing channels, the user is waiting longer for the next GOP to come around.
It short, this start-up delay is very annoying to, and provokes impatience in, the typical users.