In recent years, Voice over Internet Protocol (VoIP) develops rapidly. Compared with a traditional phone, VoIP has advantages such as occupation of few network resources and low costs. However, an Internet Protocol (IP) network provides only best effort services, and voice may be affected in a transmission process by multiple network factors such as packet loss, which leads to voice quality deterioration. By monitoring and feeding back voice quality, compression or transmission parameters can be adjusted to improve the voice quality. Therefore, how to measure and evaluate the voice quality accurately and reliably in real time is critical in network measurement and network planning.
According to a type of information input into a model and accessed bit stream content, voice quality evaluation methods may be classified into: a parameter planning model, a packet layer model, a bit stream layer model, a media layer model, a hybrid model, and the like. A voice quality evaluation method based on the packet layer model allows only voice quality evaluation to be performed by analyzing packet header information of a voice packet, and has low calculation complexity and is applicable to a case in which data packet payload information cannot be accessed. However, a voice quality evaluation method based on the bit stream layer model allows not only analysis of packet header information of a voice data packet, but also analysis of voice load information and even voice decoding; for example, a waveform of a voice signal is analyzed to obtain more detailed packet loss information and distortion information, so as to obtain prediction quality that is more precise than the voice quality evaluation method based on the packet layer model; but calculation complexity of this method is higher than that of the packet layer model. Therefore, the two methods have their respective advantages, and are two common voice quality evaluation methods. However, both the voice quality evaluation method based on the packet layer model and the voice quality evaluation method based on the bit stream layer model generally use an average compression bit rate of voice to evaluate compression distortion, and use an average packet loss rate to evaluate distortion caused by packet loss; and then evaluate the voice quality according to the compression distortion and the distortion caused by packet loss.
In a process of researching and practicing those approaches, an inventor of the present application finds that composition of voice is complex, for example, a case in which silence (for example, a talk interval) often occurs in the voice, but voice quality is measured according to only average distortion information in an existing solution. Therefore, prediction precision is not high, and an evaluation result is not accurate enough.