In the early years of the Internet, its primarily use was for the reliable transmission of data with minimal or no delay constraints. Transmission Control Protocol (TCP), of the TCP/Internet Protocol (IP) protocol suite, was designed for this type of delay independent data traffic. TCP typically works well in this context where the reliability of the packet delivery is much more important than any packet delays. In order to achieve this reliability, TCP sets up a connection at both ends and attaches a header to each packet that contains the source and destination ports as well as the sequence number of the packet and other such administrative information. The destination typically receives a number of TCP packets before sending an acknowledgement to the source. If the acknowledgment fails, the source will generally presume the packets were lost and retransmit the “lost” packets. While this process ensures reliable delivery, packets may be delayed which, in multimedia streams, may cause noticeable and unacceptable degradation of quality in the multimedia playback.
An alternative transmission protocol in the TCP/IP protocol suite is User Datagram Protocol (UDP). Unlike TCP, UDP is connectionless and unreliable, meaning that it does not establish a connection at both ends and does not include a resource for resending lost packets. Instead, the UDP packets are sent out with a packet header that typically includes only the source and destination ports along with a 16-bit segment length and 16-bit checksum for minimal error detection. Because UDP does not include the additional administrative information, it generally makes no delivery guarantees, offers no flow control, and performs only minimal error detection. As such, UDP has useful timing characteristics for real-time audio or video transmission, where the delivery of every packet is not as important as the timely delivery of packets. UDP was generally used as the early transport protocol for real-time multimedia applications because it typically offers these beneficial characteristics for delay-sensitive data delivery. However, by itself, UDP usually does not provide any general purpose tools that may be useful for real-time applications.
In response to the limitations of UDP, Real-time Transport Protocol (RTP) was developed to operate as a thin layer on top of UDP to create a generalized multipurpose real time transport protocol. An RTP fixed header may generally include: a 7-bit payload type field for identifying the format of the RTP payload; a 16-bit sequence number which is incremented by one for each subsequent RTP data packet transmitted; a 32-bit timestamp that corresponds to the time that the first RTP data packet was generated at the source; a 32-bit synchronization source identifier which is a randomly generated value that uniquely identifies the source within a particular real-time session; as well as other administrative information. With this information, RTP provides support for applications with real-time properties including timing reconstruction, loss detection, security, and content identification without the reliability-induced delays associated with TCP or the lack of timing information associated with UDP.
Real-Time Control Protocol (RTCP) works in conjunction with RTP to provide control support to the application for maintaining the RTP session. RTCP generally performs four functions: (1) providing information to the application regarding the quality of transmission, such as number of packets sent, number of packets lost, interarrival jitter, and the like; (2) identifying the RTP source through a transport-level identifier, called a canonical name (CNAME), to keep track of the participants in any particular RTP session; (3) controlling the RTC P transmission interval to prevent control traffic from overwhelming network resources; and (4) conveying minimal session control information to all session participants. The RTCP packets are typically transmitted periodically by each participant in an RTP session to all other participants. Therefore, RTCP provides performance and diagnostic information that may be used by the application.
One of the major problems associated with streaming multimedia information arises in attempting to stream the media through firewalls, Network Address Translation (NAT) devices, and the like. The major purpose of firewalls is to prevent unauthorized and/or hostile access to a computer system or network. As such, firewalls are generally configured with strict rules specifying specific, static ports through which desired and/or authorized data traffic can pass, while blocking undesirable data. The majority of all IP protocols use RTP for transporting the media streams. RTP is built over UDP, which generally has no fixed ports associated with it. Thus, there is no guarantee that a port associated with the incoming RTP/UDP stream will be allowed through the firewall. Moreover, each media stream typically has multiple channels, which generally requires its own opening through the firewall. This means that for the media stream to traverse the firewall, the firewall will have to open many UDP openings for each call session, which defeats the purpose for the firewall in the first place.
NAT devices are used to translate an IP address used within one network to a different IP address known within another network. NAT devices typically maintain a map of addresses within an “inside” network. Any communications directed to users within the inside network usually pass first through the NAT device for translation to the inside address. Thus, users within the inside network may see out, but outside users can only typically communicate with the inside users through the NAT device's translation. NAT devices may allow a network to support many more users or clients than it has fixed IP addresses. The NAT device may be addressed from the outside using the few fixed IP addresses, yet service many other address within the inside network.
Another problem with the existing streaming protocols is the amount of header information attached to any given piece of data on the stream. As mentioned above, UDP and TCP messages contain considerable header information concerning the timing, sequence, data type, and the like. Because multiple streams are typically running at once, each piece of data generally has a stream ID to tell the destination which stream any particular piece of data belongs to. In a situation where an audio stream is established where the message type is constant for a period of time and the formatting of the message type requires a constant number of bits per message, the header information on the type, length, and the like congests the stream with useless information, thus taking away from the maximum available data bandwidth.