In the past few decades, advances in the related fields of video compression and video transmission systems have lead to the widespread availability of digital video programs transmitted over a variety of communication systems and networks. Most recently, new technologies have been developed that have allowed audio and video programs to be transmitted as multicast digital bitstreams of multiplexed video and audio signals delivered to users or client subscribers over packet switched Internet Protocol (IP).networks.
IP multicasting is defined as the transmission of an IP datagram (i.e., a data packet formatted according to the Internet Protocol) to a “host group”, which is a set of hosts identified by a single IP destination address. A multicast datagram is asymmetrically delivered to all members of its destination host group. The Internet Group Management Protocol (IGMP), which is defined in RFC792, is used between IP hosts and their immediate neighbor multicast agents to support the creation of transient groups, the addition and deletion of members of a group, and the periodic confirmation of group membership. Multicast data streams are typically sent using the User Datagram Protocol (UDP), which is implemented in the transport layer and provides a connectionless datagram service for the delivery of packets.
By way of further background, U.S. Pat. No. 6,771,644 teaches a system and method for supporting audio/video program insertion in real-time IP multicasts. Scheduling and control traffic occurs through a new protocol wherein smooth transitions occur by manipulation of the Real-Time Protocol (RTP) header in the packets and the associated RTP Control Protocol (RTCP) stream. RTP is a known protocol for transmitting real-time data such as audio or video streams. While it does not guarantee real-time delivery of data, RTP does provide mechanisms for sending and receiving applications to support data streaming. RTCP, on the other hand, relies on the periodic transmission of control packets from the endpoints to the originator of the data (media) stream using the same distribution mechanism—but not necessarily the same path—as the data packets. A method and system for providing media services in voice over IP (VoIP) telephony in which audio is transmitted in packet streams such as RTP/RTCP packets is disclosed in U.S. Pat. No. 6,947,417. U.S. Pat. No. 6,044,081 teaches a communications system and multimedia system that allows private network signaling to be routed over a packet network.
In video streaming applications, MPEG video streams comprise different types of frames that do not include all of the data to be displayed at any given time. For instance, Inter-frames, or I-frames, are the only type of frame that is not coded with reference to any other frame; P-frames are coded predicatively from a previous I-frame or P-frame; B-frames are coded predicatively from I-frames and P-frames. In order to be properly decoded, a B-frame associated with a group of pictures (“GOPs”) may need to reference the I-frame of a next GOP. Due to their comprehensive nature, I-frames are generally much larger (e.g., 5 Kbytes or more) than either P-frames or B-frames. (It should be understood that a GOP is an optional structure of an elementary stream. Also, in the context of the present application, the term “I-frame” is intended to broadly refer to an Inter-frame and its equivalents, e.g., an IDR frame in the case of H.264.)
For audio/video solutions, RTP streams are typically created between the media server and the endpoints. In the case of video, the RTP packet sizes that include I-frames are often larger than the maximum transmission unit (MTU) size that the intermediate nodes in the network can process. When the RTP packet size exceeds the MTU of a node, the packets usually are fragmented into smaller sized packets. Fragmentation of packets, however, adds latency that slows down video data transmissions and other time-sensitive applications. Therefore, fragmentation of RTP packets should be avoided as much as possible.
In the case where the packet header has the “Don't Fragment” (DF) bit marked, the packet is simply dropped. In accordance with RFC1191, an ICMP notification that the packet has been dropped may be sent back to the source node. Although another attempt may be made to re-send the dropped packet, the RFC1191 mechanism provides no assurance that the resent packet will successfully traverse the network without being fragmented. For instance a large video packet containing an I-frame may be repeatedly re-sent by the source with the DF bit set, but never arrive at the destination. Obviously, this is a serious problem for video streaming traffic which usually must be delivered without delay or interruption. Furthermore, marking video packets with the DF bit will result in loss of packets for certain specifications, such as the H.26x codec standard, which lack a resend mechanism.
In many cases, fragmentation occurs when data packets are passed from one networking layer to another. By way of example, FIG. 1 shows the well-known Open System Interconnection (OSI) model for implementing protocols in seven layers. As can be seen, the OSI model includes an application layer that supports application and end-user processes; a presentation layer that works to transform data into the form that the application can accept; a session layer that establishes, manages and terminates connections between applications; a transport layer that provides transparent transfer of data between end systems; a network layer that creates logical paths called-virtual circuits for transmitting data from node to node; a data link layer responsible for encoding and decoding data packets into bits; and the physical layer that conveys the bit stream through the network at the electrical and mechanical level. Control is passed from one layer to the next, starting at the top (application) layer at the sending computer or node, and proceeding to the bottom (physical) layer, over the channel to the receiving computer or node, and back up the layer hierarchy.
What is needed then is for a solution to the problem of fragmentation of large, time-sensitive data packets, such as audio and video data packets.