With rapid development of the speech recognition technology and mobile Internet, more and more speech input-based application programs have been provided on electronic devices such as mobile phones, tablet computers, smart TVs. Such speech input-based application programs provide services according to speech signals input by users.
Investigation from users within a specific range shows that when users use speech input-based application programs, the three functions that are used most frequently are: setting a reminder by using speech input, querying weather by using speech input, and determining geographical location user by using speech input.
Using setting a reminder by using speech input by a user on a smartphone as an example, the current reminder setting method includes the following steps: Firstly, the smartphone collects a speech signal input by the user, where the speech signal is used to set a reminder corresponding to a reminder time point, and for example, the speech signal may be “wake me up at 8 o'clock tomorrow morning”; after the smartphone forwards the speech signal to a server, the server processes the speech signal by using “continuous speech signal” and “semantic analysis”, i.e., the server firstly recognizes all the speech signals to a corresponding text sequence by using continuous speech recognition, then extracts time information “8 o'clock tomorrow morning” and reminder content “wake me up” from the text sequence by using semantic analysis, and feeds back the extraction result to the smartphone; finally, the smartphone sets the corresponding reminder according to the time information “8 o'clock tomorrow morning” and the reminder content “wake me up”.
During the implementation of the present invention, the inventors find that the prior art has at least the following problems:
First, during the process of recognizing by the server all the speech signals to the corresponding text sequence, the accuracy in whole-text recognition is not stable. For example, in the case of severe ambient noise, the accuracy is obviously reduced. Still for example, since the basic decoding principle of the continuous speech recognition is to seek an optimal global solution, if the initial part of the speech signal is incorrectly recognized, the probability that the subsequent part of the speech signal is incorrectly recognized is very high.
Second, during the processing of extracting by the server the time information and the reminder content from the recognized text sequence, generally the text sequence is matched based on a template, the time information is extracted according to the time region in the matching result, and the reminder content is extracted according to the event region in the matching result. During specific implementation, various possible text sequence templates need to be collected in advance. Due to restriction in the template collection, the finally collected text sequence templates may fail to cover all possible text sequence forms. As a result, even if the speech signal is correctly recognized to the corresponding text sequence, a reminder may fail to be set or is set incorrectly because the time information may still fail to be extracted due to incomplete matching of the text sequence templates.