The interest for transporting speech over packet based networks has grown the last few years. It has become to be known as IP-telephony. Most packed based systems of today are based on the Internet protocol (IP), and its sub protocols, the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). TCP guarantees reliable transmission of data and allows some sort of flow control. A typical application using the TCP protocol is the File Transfer Protocol (FTP). During a file transfer, it is very important that the data gets to the receiving host and therefore one has to make sure that the packets arrives and are sorted into correct order. UDP does not provide any guarantees about the connection, but is used when a guaranteed connection requires too much control signaling.
Real-time applications, as for example IP-telephony, use UDP. For these applications, retransmission of lost packets makes no sense since resent packets will be too late to be used in the synthesis at the receiving side anyhow. IP-telephony uses the Real Time Protocol (RTP) together with IP/UDP protocols. The RTP header contains information about sequence number, the packet's time etc. RTP is e.g. used to synchronize audio and video streams. Another essential part of the transmission of real-time streams is the Real Time Control Protocol (RTCP). It is used for the control of RTP. RTCP conveys information about the session participants, and periodically distributes control packets containing quality information to all session participants.
One problem, which occurs in IP-telephony, is underrun or overrun in the playout buffer. The playout buffer is a buffer where the speech samples are stored before they are played out by the D/A converter. If there is underrun, the playout buffer will get into starvation, i.e. there will no longer be any samples to play on the output. Overrun occurs when the playout buffer is filled with samples. Consequently, samples will be lost.
The reason for these problems is the lack of synchronization between the sampling rates (i.e. sampling frequencies) at the sending and receiving side. Namely, in the communication between transceivers in the telecommunication system, the messages are sent in form of digital signals from the sender of a transceiver to the receiver of another transceiver. The signals transmitted from the sender have a first sampling frequency. The receiver buffers these signal in a playout buffer with this sampling frequency but plays them out with a second sampling frequency. When the first frequency, with which the signals are buffered in the playout buffer, is higher than the second frequency, which is the playout frequency, there is a risk for the play-out buffer to be filled with samples and there will be no room for subsequent samples, i.e. overrun. When the first frequency is lower then the second one, the play-out buffer might come into a situation without samples, i.e. underrun.
In cellular systems, the sampling frequency of all terminals connected to the network are controlled by an accurate timing reference provided by the system. With this accurate timing reference and PLLs (Phase locked loops) controlling the sampling frequency, underrun or overrun situations will never occur. By PLL technique, e.g. the sampling rate can be controlled. If the buffer is growing it plays out faster, where after the buffer return to its default value. The sampling rate is all the time corrected depending on the size of the buffer.
To compensate for the difference in sampling rates between the sending and the receiving sides, time stretching could be used to give a stimulus the same duration at the receiving side as the duration the same stimuli had on the sending side. How much to stretch the signals depends on the difference in sampling frequency between the sending side and the receiving side(s). Time stretching means that a stimulus of N-samples is replaced with one with M samples. By doing this time stretching in an appropriate way, overrun or underrun will never occur.
There are different ways to do this time stretching. In EP patent application 0680033, a solution to stretch a speech signal in time is presented. It takes a speech stimulus with a first duration and changes this speech stimulus to a second duration. A solution to find the conversion factor between said first and second durations is not given.
Another solution to stretch a signal in time is presented in “Applications of Digital Signal Processing to Audio and Acoustics” (p. 291) by Mark Kahrs and Karlheinz Brandenburg, published by “Kluwer Academic Publishers”, 1998, London. This method is not signal dependent, but it takes an arbitrary signal consisting of N samples and replaces it with another signal consisting of M samples. This solution to do time stretching does not give the conversion factor either.
Currently, most manufacturers of IP-telephony equipment do not take into account the fact that the sampling frequency difference between the sending and receiving side might differ. Therefore, no solutions are available.
Another field of interest for IP-telephony is accurate measurements of the end-to-end delay between two terminals. To be able to get accurate results of the end-to-end measurements, there is a need to compensate for the clock skew, i.e. the difference in clock frequency between the sending and the reciving side. In an article by Moon S., Skelly P., Towsley D, “Estimation and Removal of Clock Skew from Network Delay Measurements”, Technical report 98-43, Department of Computer Science, University of Massachusetts at Amherst, October 1998, there is presented different methods to estimate the difference in clock frequency between the computer at sending and receiving. The estimation is performed at the receiving end, and the methods all use the time stamps from RTP and measurements of the arriving time of the packets.
Another approach to extract the difference in clock frequency, close to the above mentioned method, is protected by the Nippon Telegraph & Telephone Corporation in the Japanese Patent Application JP-10145345. By sending information about the transmitting time together with the data (or speech) to the receiver and using measurements of the receiving time at the receiver, the frequency ratio between the two terminals can be calculated.
The solutions proposed in the above mentioned methods yields a satisfactory estimate of the clock skew, but not fast enough. The method described in JP-10145345 assumes that the estimation of the frequency ratio takes place during the call. However, the estimation process is slow and an overrun/underrun situation might already have occured during this time, with audible artifacts as a consequence.
The object of the invention is therefore a method and arrangement that provides for faster estimation of the clock skew to avoid delays and/or interrupts in the transmission from sender to receiver.