In wireless communication of media in IP (Internet Protocol) packets between a mobile terminal and a base station of a wireless access network, the concept of media layer adaptation is often employed at the packet sending node when quality-related problems are detected at the packet receiving node. In this description, the packet sending and receiving nodes will be referred to as “sender” and “receiver” for short, and the communication discussed may be either downlink or uplink.
Quality problems at the receiving side may arise due to varying channel conditions and/or congestion in the network resulting in changed data rates, which has an impact particularly for delay-sensitive real-time services such as VoIP (Voice over IP), video and interactive game applications. A media receiving end user may perceive a decreased data rate over the used wireless channel as impaired sound or image, and/or increased delays when the media is played out. In this description, the term “performance” refers to any quality impacting characteristics of the used channel that may be relevant in this context resulting in impaired quality when playing out the media at the media receiving end user.
The media layer adaptation technique of today involves different adaptation methods at the sender that are typically tried one by one to overcome the performance problems, based on a feedback mechanism where the receiver basically indicates to the sender if the reception quality is good or bad. Different adaptation methods can be applied at the sender, including: 1) “back-off in bit rate” by reduced source coding bit rate, 2) “frame aggregation” where multiple media frames are aggregated into a common physical frame which reduces the amount of protocol overhead, and 3) “redundancy” by transmitting the same information more than once.
These methods are thus introduced and tried one by one at the sender, preferably starting with the one most likely to succeed depending on what system is used. For example, a back-off in bit rate is effective in interference sensitive systems where the channel performance easily degrades when the amount of other interfering signals is high, such as HSPA (High Speed Packet Access) and LTE (Long Term Evolution), while frame aggregation is mostly helpful in packet rate sensitive systems, such as WLAN (Wireless local Area Network). Redundancy is typically helpful to improve performance in both types of systems but may result in increased network load and is therefore usually employed as a last measure when neither bit rate back-off nor frame aggregation is successful.
FIG. 1 illustrates how media layer adaptation is basically used for a voice communication between a sender 100 and a receiver 102 of communicated data packets, such as VoIP. In this figure, packet communication is represented by a thick arrow while control messaging is represented by thin arrows. The well-known speech codec AMR-NB (Adaptive Multi-Rate-Narrow Band) may be used by sender 100 and receiver 102 for encoding/decoding the communicated speech. The protocol RTP (Real Time Protocol) is typically used for communicating the data packets over the wireless channel. The sender 100 sends packets over a wireless channel to the receiver 102 which temporarily stores the received media data in a media buffer, not shown, before being decoded for playout. The receiver 102 also monitors the channel performance by measuring a suitable quality indicating parameter of the received packets, e.g. packet loss rate, or packet delay jitter.
When the monitored performance falls below an acceptable level, the receiver 102 selects a media layer adaptation method out of a plurality of available methods 1-3 and sends an adaptation request as feedback to the sender 100, suggesting the selected adaptation method. This request may be sent in a control message according to the well-known protocol RTCP (RTP Control Protocol). The sender 100 then applies the suggested adaptation method when sending further packets to the receiver 102 over the used channel. If the monitored performance still is unacceptable, the receiver 102 selects another media layer adaptation method from the available methods 1-3, and sends a new adaptation request to the sender 100 accordingly, and so forth.
As indicated above, an adaptation algorithm in the receiver 102 is typically configured to select and suggest the adaptation methods 1-3 in a certain preset order, depending on their likelihood to improve the performance in the channel used. In one example, the adaptation algorithm first tries to employ the back-off in bit rate, and if not successful it then tries to employ frame aggregation together with the backed-off bit rate. If this still does not improve the performance sufficiently, the adaptation algorithm tries redundancy.
This scheme works fairly well in the interference sensitive systems HSPA and LTE as the adaptation method most likely to be successful in these systems is tried first, i.e. the back-off in bit rate. However, for systems where reduced bit rate has no or very limited effect, it may take some time before the performance becomes acceptable, thus resulting in prolonged bad quality as perceived by the user(s) as well as excessive feedback signaling and processing. For example, WLAN systems are mainly packet rate sensitive and not very receptive to reduced bit rate, particularly when relatively small packets are used as in VoIP, although WLAN systems may in other cases actually be receptive to reduced bit rate, e.g. when the packets are relatively large such as video packets. The feedback mechanism is also somewhat time-consuming since the measurements and evaluation described above must be done over a sufficient amount of successive packets before trying another adaptation method, which is needed to ensure that the measurement basis for the analysis is stable and a too short analysis period increases the risk for applying a non-relevant adaptation method.
Further, if only the packet loss rate is measured as a quality indicating parameter, the output quality at the receiving side may be seriously impaired even when there are basically no packet losses and no performance problems can be detected by the monitoring function. This situation may occur for example when the network capacity only allows for a limited/reduced data throughput which is not sufficient to support the packet rate required by the service used. Instead of discarding packets, the sender decreases the packet sending rate to fit the reduced network capacity. As a result, the receiver will receive packets less frequent than expected and a shortage of packets in its media buffer occurs, referred to as “buffer under-run”, which will naturally disturb playout of the media.
FIG. 2 illustrates schematically how frame aggregation works in the case of a VoIP service for communicating speech over a wireless channel. In this example, RTP is used for communicating data packets, often referred to as RTP packets, in the physical layer. 200 denotes a sequence of media frames created in the application layer, sometimes also referred to as data frames or content frames, to be packetized for transmission at the sender according to a frame scheme 202 of transmitted RTP packets 1-16 in the physical layer. A speech coder typically generates a media frame every 20 ms (millisecond).
When applying frame aggregation 204, the number of media frames in each RTP packet is basically increased to reduce the communication of overhead information per media frame, which may however increase the general delay between packet generation at the sender and packet reception at the receiver. In this example, a first shown packet 206a using a certain header size H contains two media frames 1-2 and packet 206a is transmitted from the sender as RTP packet 3 thus containing 40 ms of speech. By applying frame aggregation, the number of media frames accommodated into the next shown packet 206b is increased to four media frames 3-6 using the same header size H, and packet 206b is transmitted as a next RTP packet 10 thus containing 80 ms of speech. Thereby, the receiver will get media data more “densely”, also reducing the total overhead communication in the form of packet headers. U.S. Pat. No. 7,460,524 B2 discloses how frame aggregation can be applied for a wireless channel to deal with excessive packet delay jitter.
As mentioned above, it is often a problem that the “best” media layer adaptation, or even an acceptable one, is not found at once but it can take some time before different methods are tested and evaluated to arrive at acceptable performance. When frame aggregation is applied, a less than optimal number of media frames may be tried in each transmitted packet, and may be necessary to try several different aggregation schemes in multiple transmitted packets before arriving at the best, or at least acceptable, performance. For example, if 1 frame/packet is currently used, a typical adaptation scheme would first try a frame aggregation of 2 frames/packet, evaluate the performance, then try 3 frames/packet, evaluate the performance, then try 4 frames/packet, and so forth, until it can be concluded that the best, or an acceptable, frame aggregation has been reached. The above delay of trying out different frame aggregations before a useful frame aggregation has been determined may be considerable, each try requiring a sufficient analysis period as mentioned above, thus resulting in prolonged performance problems at the media receiving end user.