The increased possibilities of contacting and communicating with anybody, anywhere, at any time and for an insignificant cost, and the possibility of making local calls via the worldwide interconnection network provided by the Internet have developed a taste for telephony over packet-switched networks and more particularly for telephony over Internet Protocol (IP) networks.
Unlike conventional fixed network telephony over the public switched telephone network, the quality of telephony over a packet-switched network, in particular over an IP network, is usually mediocre, there being no guaranteed minimum quality level.
This lack of a guaranteed voice quality in telephony over an IP network is inherent to the design of the network, which gives priority to interconnecting the parties, to the detriment of the bandwidth finally allocated to the speech signal, the only criterion in terms of allocated bandwidth that is permitted being a best effort criterion. In particular, the bit rate of the information transmitted in the form of digital packets representing speech signals decreases as more and more users of the IP network log onto the network to use it for IP telephony or other purposes.
The deployment of satisfactory IP telephony services therefore necessitates control of the quality of the services offered and in particular necessitates the use of tools to measure that quality.
The main defects of IP telephony transmission, also known as Voice over IP (VoIP) transmission, are as follows:                Long delays, linked to routing delays and network equipment processing delays, which can impede interactivity and therefore make conversation between the calling and called parties difficult, if not impossible.        The effect of jitter on the packet routing delay, i.e. statistical variance of the transmission delay, reflected in varying time intervals between packet arrival times.        Loss of packets, caused either by the elimination of packets during routing when their lifetime has expired as a result of router congestion or by them reaching the remote terminal with too great a delay, in which case they are destroyed on arrival.        Echo, linked primarily to long and extremely variable delays.        Distortion caused by coding speech signals in digital packets at a low bit rate, as is generally the case in VoIP.        
The end-to-end transmission delay of the speech signal represents the cumulative result of all the delays generated in the speech signal transmission and processing chain. As such, it constitutes the delay actually perceived by the user, and is sometimes called the mouth to ear delay.
The end-to-end transmission delay is therefore made up of the transmission delay over the packet mode (IP, ATM—Asynchronous Transfer Mode) network and the processing and transmission delays in the IP terminating equipments (IP telephone, gateway, local area network).
The transmission delay over the packet mode network takes into account the processing delay in the equipment and in particular in the routers. That processing delay depends in particular on the number of units the signals pass through, the functions implemented in those units (proxy, transcoding, firewall, etc.), and the available bandwidth.
The processing delay in the terminating equipment and networks takes into account the delays introduced by the audio codec (coder/decoder), the jitter buffer, packetization, voice activity detection, etc., and where applicable transmission over a transmission network or circuit mode access network. The processing delay in the terminating equipment and networks can be broken down into a send portion and a receive portion.
It can therefore be very useful to measure the end-to-end transmission delay in the context of evaluating the quality of voice calls over the packet-switched network, because the measured transmission delay can be correlated with the quality level perceived by the user. Consequently, if limiting values, in terms of perceived quality, for the end-to-end transmission delay are known, action may be taken at the level of network engineering or terminating equipment configuration with a view to keeping the end-to-end transmission delay within acceptable limits.
The end-to-end transmission delay of a speech signal is typically measured intrusively, i.e. by setting up test calls between two probes simulating or substituted for the terminals.
The transmission delay as such is measured by comparing the signal sent by a sender probe and the signal received by a receiver probe. It is essential that the two signals are recorded using the same clock and that the two probes are synchronized.
The test signals used for these measurements can be speech signals, composite signals or single frequencies.
Probes available off the shelf that use this kind of intrusive measurement include those using the perceptual evaluation of speech quality (PESQ) psycho-acoustic model standardized by Recommendation P.862 of the ITU-T (International Telecommunications Union), for example.
Evaluating the end-to-end transmission delay by means of intrusive probes has two drawbacks. Firstly, the measurements obtained do not relate to real calls between users. Secondly, they do not take account of processing of the speech signal in the real terminals of users. Consequently, this type of evaluation can be used to characterize the quality of a telephone service in a general way but not to characterize the voice quality of real calls between two users.
Accordingly, given the increasing expansion of IP telephony, there is a real need for tools for evaluating the voice quality actually perceived by the user of a Voice over IP telephone terminal during real telephone calls. There is also a correlated need for tools for non-intrusively evaluating the real processing delay for a speech signal received in a terminal, and thereafter the end-to-end transmission delay for a speech signal during real calls between two IP terminating equipments, such as IP telephones (known as IPphones) or PCs equipped with IP telephony software (known as softphones).