Over years in the areas of telephony, data networking, and telecommunications there has been a network shift from analog to digital, wired to wireless, and a continuous migration of some voice calls from conventional time division multiplexing (TDM) networks to packet based internet protocol (IP) networks. Voice over Internet Protocol (“VoIP”) uses real-time transport protocol (RTP) to provide end-to-end delivery services for data with real-time characteristics, such as interactive audio and video. Every RTP packet has a fixed length RTP header, following by RTP payload. The basic RTP header has twelve bytes of data. Among others, the RTP header contains payload type, sequence number, time-stamp, and synchronization source (SSRC).
While the increasing of VoIP applications brings new services and lower costs, it also exacerbates the uncertainty of end-to-end voice quality. Voice transmitted over IP networks is suffering from all kinds of impairments such as delay, packet loss, jitter, disorder, etc. In an IP based telephony network, packets may be lost due to network delay, congestion, or errors, thus causing degradation in voice quality. Packet loss concealment (PLC) algorithms are used to recover from lost packets and to improve the impaired quality, where correctly identifying the timing changes in the incoming RTP packets plays an important role.
One of the challenges in the field of VoIP is that more and more services are introduced over years. One example is the interactive voice response (IVR) technology that allows a computer to interact with humans through the use of speech recognition and dual tone multi-frequency (DTMF) tones input via keypad. In telecommunications, IVR allows users to interact with a service provider's host system via a telephone keypad or by speech recognition, so they can service their own inquiries by following the IVR dialogue. IVR systems can respond with pre-recorded or dynamically generated audio to further direct users on how to proceed. In this way, IVR applications may be used to control almost any function where the interface can be broken down into a series of simple interactions. Different real-time streams with voice, data, tones, and events are combined into RTP packets, which may require dynamically changing the RTP header (e.g., with different SSRC numbers). Such RTP packets, once received by the end users, have to be seamlessly played. Unfortunately, the RTP data header validity algorithm in the original RTP standard does not cover the typical IVR applications with dynamically changing SSRC numbers. This leads to the degradation of speech recognition accuracy due to too many missing packets.
RTP is known to be vulnerable to many types of attacks such as spoofing, hijacking, denial of service, traffic manipulation, eavesdropping, and voice injection. As a result, it may be difficult to design a robust RTP packet processing module to prevent the system from rogue RTP attacking Malicious RTP packets contain wrong and misleading timing information in the sequence number and timestamp. Once locked, they can cause system crashes and un-deterministic system behavior. So, it may be critical for VOIP system stability and reliability to filter out malicious rogue RTP packets once detected.