In recent years, with rapid development of communications networks, network voice communication has become an important aspect of social communication. In a current big data environment, monitoring performance and quality of voice communications networks is particularly important.
Currently, there is no simple and effective low-complexity algorithm for a signal-domain-based objective model of voice quality evaluation in voice communications. Researches in the industry mainly focus on numerous factors affecting voice quality in voice communications, and relatively few researches can provide a low-complexity signal-domain-based evaluation model.
In an existing signal-domain-based objective technology of voice quality evaluation, a process of voice signal perception by a human auditory system is simulated by using a mathematical signal model. In the technology, auditory perception is simulated by using a cochlea filter, then time-to-frequency conversion is performed on N sub-signal envelopes that are output by using a cochlea filter bank, and spectrums of the N signal envelopes are processed by means of an analysis of a human articulatory system, to obtain a quality score of a voice signal.
In the prior art: (1) Use of a cochlea filter to simulate a human auditory system to perceive a voice signal is relatively crude. On one hand, this is because a mechanism for voice signal perception in a human body is complex, includes not only an auditory system but also cerebral cortex processing, human neural processing, and priori knowledge in life, and is a comprehensive cognition and determining process combining multiple subjective and objective aspects. On the other hand, this is because responses of cochleae of different individuals to a voice signal frequency are not completely the same, and responses of cochleae of people to a voice signal frequency that are measured in different time periods are not completely the same. (2) The cochlea filter divides an entire spectrum band of a voice signal into multiple key frequency bands for processing. Therefore, corresponding convolution operation processing needs to be performed on the voice signal in each key frequency band. This process requires complex computation and relatively high resource consumption, and is deficient in monitoring a huge and complex communications network.
Therefore, an existing signal-domain-based solution of voice quality evaluation has high computational complexity, requires high resource consumption, and does not have a sufficient capability to monitor a huge and complex voice communications network.