In speech recognition technologies, a section of speech generally needs to be retrieved, so as to determine whether the speech includes a concerned keyword. For example, when speech recording is performed on a conference, whether the conference is a computer related conference needs to be determined, and the determining is performed by retrieving whether the recorded speech includes keywords such as “display,” and “keyboard.”
Applications of speech keyword retrieval are wider and wider currently, but a majority of the applications are performed for mandarin or another specific dialect, and therefore the limitation is large. In the conventional speech keyword retrieval solution, keyword retrieval is performed only for a certain type of languages, a retrieval algorithm for the language and a language model are fused together, the retrieval algorithm is responsible for the entire retrieval process, where the language model is invoked to perform language recognition and decoding, and after the decoding, whether a concerned keyword exists in a decoding result is judged; if yes, the corresponding keyword is output; if speech data does not belong to the language, recognition cannot be performed, and another retrieval algorithm capable of recognizing the corresponding language needs to be used to perform keyword retrieval on the language again.
To sum up, in the conventional technology, the speech keyword retrieval solution only supports processing for a certain specific language, and each type of languages have respective complete speech keyword retrieval solutions, the limitation is very large, and the cost is high.