Delay is a critical factor in voice communications, and an end-to-end delay is a delay in the entire process in which a voice is captured, pre-processed, coded, packed, transmitted through a network, unpacked and played finally. Since a large delay can affect the subjective auditory experience of audience for a voice product, it is necessary to measure and evaluate a delay of a voice system. Some current methods for measuring a delay are invasive, and some are non-invasive.
An invasive measurement is conducted inside a voice system under test, and some features of an invasive system are described as follows.
First, since measuring data is generally transmitted together with data frames or data packets of a system under test, which will inevitably undergo processes, such as, compression coding, packaging, unpackaging and decoding. The measuring data may be lost or damaged in the processes of compression coding and decompressing.
Second, since the data format, the packaging format, and algorithms of compression coding and decoding of the system under test may not be public, it is difficult for testers to design matching measuring methods and measuring signals.
Besides, tool software for measurement may be required to be run on terminals of the system under test in some invasive measuring methods and timing is performed with the tool software, which may affect normal operation of the terminals of the system under test.
Most current non-invasive measuring systems are based on delay measuring methods of single-end requesting and bidirectional averaging.
The measuring method shown in FIG. 1 is a single-end capturing and bidirectional transmitting and averaging based method for measuring a delay, which mainly includes the following steps: (1) playing an audio signal locally, capturing, by a local measuring apparatus, the audio signal and recording a time stamp T1 for the capturing, (2) simultaneously capturing, by a local section of a system under test, the audio signal, which is thereafter transmitted to a remote terminal of the system under test through the system under test for playing, (3) capturing, by the remote terminal of the system under test, the sound played by the remote terminal of the system under test, which is thereafter transmitted to the local section of the system under test through an intermediate network for playing, (4) capturing, by the measuring apparatus, the signal played by the local section of the system under test and recording a time stamp T2 for the capturing, calculating a difference between the time stamps of the audio signals captured by the measuring apparatus in twice and dividing (T2−T1) by 2 to obtain a delay.
The feature of the solution above is that bidirectional transmission is performed to obtain time stamps of two captured signals and a difference between the time stamps is calculated to obtain an estimated value of a one-way delay, which, however, has the following disadvantages.
First, in the process of bidirectional transmission, since there is an audio playing device and an audio capturing device at each of the two sides, echoes (direct echoes and indirect echoes) are inevitably generated in this scenario. The existence of the echoes (especially indirect echoes) may exert interference on the calculation result of the delay, causes the calculation of the delay to be complex and seriously affects the accuracy of the calculation of the delay.
Second, in the process of bidirectional transmitting and averaging above, an end-to-end delay is the entire delay from capturing a voice to playing the voice in a single communications link. The system under test is a black box, and upload and download links in most communications are not completely symmetric. Processes the voice undergoes in the communications link and in the subsequent test device may not be the same. Thereby, the delay of the voice in the single communications link is not the same as the arithmetic average of delays in two links.
No effective solution to the problems above is provided currently.