In actual life, people often use smart devices (for example, a smartphone and a tablet computer) to send voice messages. However, when using the smart devices to send the voice messages, people usually need to tap start buttons or end buttons on screens of the smart devices before sending the voice messages, and these tap operations cause much inconvenience to users.
To complete sending of the voice message without requiring the user to tap a button, the smart device needs to perform recording continuously or based on a predetermined period, and determine whether an obtained audio signal includes a voice signal. If the obtained audio signal includes a voice signal, the smart device extracts the voice signal, and then subsequently processes and sends the voice signal. As such, the smart device completes sending of the voice message.
In the existing technology, voice signal detection methods such as a dual-threshold method, a detection method based on an autocorrelation maximum value, and a wavelet transformation-based detection method are usually used to detect whether an obtained audio signal includes a voice signal. However, in these methods, frequency characteristics of audio information are usually obtained through complex calculation such as Fourier Transform, and further, it is determined, based on the frequency characteristics, whether the audio information include voice signals. Therefore, a relatively large amount of buffer data needs to be calculated, and memory usage is relatively high, so that a relatively large amount of calculation is required, a processing rate is relatively low, and power consumption is relatively large.