Conventionally, when calling by telephone speech has been transferred in circuit-switched networks, such as in a Public Switched Telephone Network (PSTN). When calling by telephone in a digital circuit-switched network, a (permanent) connection of 64 kbps (kilo bits per second) is established for each call. The constant band of a connection, 64 kbps, is due to the bit rate required in the sampling of analog speech when using 8-bit Pulse Code Modulation (PCM) at a sampling frequency of 8 kHz, which procedure enables the transmitting of analog speech of 300–3400 Hz in a digital format.
The digital telephone network presented above which is currently in common use is, however, very ineffective and, thus, uses a lot of the network's resources. In the telephone network, the band of a connection is also reserved when the connection is not actively used, i.e. neither party of the connection is transferring information along the connection. This kind of use of a static band consumes a lot of data transmission resources as a result of which as the number of users increases, additional capacity must be invested in. In addition, the band is also wasted due to the ineffective Coding Scheme standardised in the telephone network. For example, G.729-coding manages sampling even at such a low bit rate as 8 kbps. Problems result from the kind of ineffectiveness described above particularly in calls between continents, where the increasing of data transmission capacity is not as easy as it is otherwise. The problem also manifests itself partly in the prices of calls; expensive investments in the capacity must be covered by high use charges.
In particular, for connections between countries, instead of a static band reservation, so-called IP (Internet Protocol Telephony) calls have been started to be marketed. In an IP call, speech is converted first from an analog format into a digital format, it is compressed and finally converted into IP packets that are conveyed over an IP network sharing a band with the rest of IP traffic. In IP calls, a band can be used considerably more effectively than in calls that reserve a static band, which also shows in the prices of calls. Furthermore, also new more effective coding procedures can be used, such as, e.g. G.729-coding.
In IP calls, a user can make a call by an ordinary telephone through a gateway to another ordinary telephone. The gateway delivers the call to the gateway of a receiver through an IP-based data network, such as, the Internet, from where the call is further directed through the receiver's local telephone network to the receiver. In the gateway of the receiver, the call is connected back to a public switched telephone network. A second alternative is the user being in a non-switched network connection to an IP-based data network, for example, through a local area network, whereupon user does not have to open a static audio band to a telephone network at all, but a router behind which user is, can route calls to the receiver in a manner of normal packet-based data transmission. IP calls are based on an Internet protocol with the help of which speech is transferred as packets over an IP network. This means that IP calls can be transferred, in principle, in any data network that uses IP protocol, for example, in the Internet, Intranets or local area networks.
In IP calls, however, the Quality of Service (QoS) becomes a problem. The time of arrival of IP packets to a receiver is not known before the packets arrive. IP protocol routes the data flow packet-specifically due to which the delay of the packets in a network may vary greatly and the order of the packets may change. In addition, packets may be lost, for example, as a result of incoming data over flow that occur in the buffers of the routers. By using a reliable protocol, such as TCP (Transmission Control Protocol), packet losses like this can be identified automatically at the protocol level and the lost packets can be re-transmitted. However, the types of re-transmissions in question would continue to cause a varying delay as the packets pass through the network, so in IP calls UDP (User Datagram Protocol) protocol is normally used, where there are no re-transmissions. Thus, speech easily becomes fragmentary and incoherent as the delays between the packets grow although not a single packet would be lost on the way.
A solution to this problem is presented, for example, in the publication Ramjee R., Kurose K., Towsley D. 1994. Adaptive Playout Mechanism for Packetized Audio Applications in Wide-Area Networks., where incoming packet-based audio (speech) data is buffered and the initiation of the calling of a uniform audio (speech) burst comprising a plurality of packets is delayed. A short-term delay trend calculated from the delay values of the packets that came in last, i.e. a moving average calculated from the delay values, is utilised in the determination of the length of the delay.
However, such direct end-to-end delay management as this is, is not generally sufficient, for example, for ensuring the quality of an interactive real-time data stream. It is not sufficient to merely determine the delay so that only, for example, one per cent of the packets is lost, as in the model described above. It is also important to take into account the correlation between the lost packets, i.e. the so-called loss correlation. It is highly important as regards the quality of the connection whether packets are lost one here, another one there (no loss correlation) or several one after another (high loss correlation). The importance of loss correlation depends on the codec used because, for example, the codec used in a VolP (Voice over Internet Protocol) terminal, e.g. G.723.1, could be able to cover the loss of two successive packets by using Forward Error Correction (FEC), where the loss of three successive packets might cause an audible error. In this case, the method used should indeed be able to also take into consideration loss correlations of packets when deciding on the delay. However, the method reflecting prior art for buffer management does not take into account loss correlations between packets.