Sending “realtime” or “live” audio and/or video, and other media over a network involves a huge amount of quality properties that may affect the perception of the received media. First, the media must be properly recorded or captured. Additionally, the media often must be compressed before sending it over a network it to fit the bandwidth of the transport channel Compression often involves lossy processes, which compromise the quality of the media. For video, lossy compression often decreases spatial information, which may result in blocky and blurry image artifacts. Compression also may lower quality in the temporal domain by decreasing the frame rate and dropping frames for video parts that are difficult to encode.
Depending on which transport channel that is used, the transport also may introduce other temporal degradations such as delay and jitter. Delay may be defined as the time from capturing/sending media at a transmitting side to the time it is exposed at a receiving side. Some delay will always be present since all parts in the transport chain will have some amount of duration. For conversational sessions, the delay cannot be too long because users will be annoyed. The amount of delay that an individual user can tolerate may be subjective to an extent, but generally any user will become annoyed after reaching or exceeding a threshold amount (e.g., the International Telecommunications Union Standardization Sector (ITU-T) recommends that a one-way transmission time (end-to-end) delay of voice transmission quality not exceed 400 ms). In a non-conversational session, however, a user may tolerate a greater amount of delay (e.g., delay exceeding 400 ms) if delay would not be considered a critical quality factor within that particular context.
For realtime applications, a sender of media transmits packets in a regular interval and the receiver should play them in the same regular interval. However, in a packet-switched (PS) network, jitter occurs when audio and/or video packets arrive at the receiver at times that vary from an expected or “ideal” position in time. Upon playback, the jitter results in a jerky playback of the video frames or noticeable decrease in voice quality. This type of jitter is sometimes referred to as “delay jitter.”
FIGS. 1a and 1b illustrate the concepts of delay and delay jitter in a PS network. In FIG. 1a, an ideal or expected uniform delay d1 occurs for each of packets P1-P5 transmitted from a sender S to a receiver R. This uniformity in delay indicates absence of any delay jitter and the resulting audio or video will be perceived as a smooth playback of the media as originally transmitted. In FIG. 1b, a non-uniform delay d2 for packet P3 and d3 for packet P5 between the sender and receiver indicates presence of jitter because both packets P3 and P5 arrive later than an expected delay, d1. In such a case, the previous frames P2 and P4 would appear “frozen” to an observer of the playback until the arrival of the late packets P3 and P5.
Another form of jitter called “inter-stream jitter” or “skew” is associated with separate streams that pertain to a same application (e.g., voice and video). The inter-stream jitter or skew is a measure of the difference in delay, or an amount that the streams are “out-of-sync” with respect to one another. User perception of good media quality often requires good synchronization (i.e., low skew), such as when watching a person talk, viewing a musical performance etc. Related art within this field is disclosed e.g. in US 2006/0095612, which describes a jitter buffer element, and in US 2005/0226233, which describes trouble-shooting in a VoIP-System, by means of a user configurable jitter buffer size.