Real-time Transport Protocol (RTP) is a connectionless, stateless protocol with no reliability built in to the protocol i.e. it does not guarantee any in-order delivery of packets. It also does not support any retransmission mechanism. For historic reasons, RTP was always meant to carry traditional voice traffic over Internet Protocol (IP). For a specific application like Voice, a protocol like Transfer Control Protocol (TCP) is not the best suited transport mechanism as a delayed voice packet is better treated as lost rather than being retransmitted later. For these reasons, RTP was a much better suited protocol for carrying voice over IP. With the advent of websites like YouTube, Google Videos, News websites and other video based websites, video has become that much of an integral part of transmission. Since video actually means video and voice together, RTP was the default transport, Internet applications chose to stream video to the billions of users watching videos. Video is inherently bandwidth intensive. With compression techniques abound, the size of the contents being packed into RTP packets has certainly seen a reduction, but really there is no comparison with the original format, namely voice.
As RTP is a pure transport layer protocol, RTP does not contain enough information for carrying out OAM (Operations, Administration and Management) kind of operations viz. statistics collection, network monitoring etc. To build this capability, a control protocol, RTCP (Real-time Transport Control Protocol), was introduced which acted as a support to RTP. RTCP is a simple point to point protocol that contains various mechanisms for the two end points to exchange statistics such as jitter and packet loss to determine the current state of the network. RTCP packets have a fairly large overhead and are typically allocated only 5% of the total bandwidth. This causes inefficient monitoring of the link as these packets are sent after large time delays rather than in a short, periodic and proactive manner. These RTCP packets contain generic information and too time insensitive when we look at the type of content being carried in them, namely video.
The current RTP standard [RFC 5248] employs the RTCP (RTP Control Protocol) as the feedback mechanism. The RTCP protocol contains 2 packet type's viz. SR (sender Reports) and RR (receiver Reports) to exchange control information between the sender and the receiver, which are heavy in nature and occupy valuable bandwidth. Moreover, the RTCP packets do not contain up to date information to calculate packet event loss rate that actually mirrors the network congestion state correctly.
Transmission interval of the RTCP reports is linearly proportional to the size of the group of which multiple senders and receivers are part of. As the group size increases, the sender and receiver reports are sent less frequently. Moreover, RTCP control traffic is allocated only 5% of the total bandwidth available which causes the information contained in the sender and receiver reports to be less accurate and less in step with the current state of the network.
The current RTCP protocol only allows the calculation of a cumulative packet loss i.e. it defines packet loss as the fraction of lost packets is defined to be the number of packets lost divided by the number of packets expected based on the highest sequence number received in RTP packets. What is not reflected is this calculation as to over how many RTT times has this happened. To accurately calculate packet loss, the packet loss event rate is required.