With development of the speech recognition technology and the mobile Internet technology in recent years, the application range of the speech recognition technology is increasingly wide. When a speech recognition function is implemented, a speech recognition decoder is usually used to decode speech data, and the speech recognition decoder, during decoding, uses acoustic models and language models to achieve conversion from speech to text. Therefore, how to build a language model is the key to improving the accuracy of speech recognition.
Currently, when a language model is built, initial language training samples covering many fields (e.g., travel, entertainment, food, politics, sports, life, medicine, etc.) are obtained by performing data mining on a dictionary, and the language training samples are used for training to obtain the language model. In the meantime, with continuous refinement of the field divisions, the number of fields (also referred as “verticals”) is continuously increased. In order to ensure that the language model obtained is a more balanced multi-vertical-field language model, an interpolation method with fixed weights for respective vertical fields is continuously performed on the initial speech training samples, so that the language models trained on the training samples covers a large number of vertical fields.
The current technology of building a language model has at least the following problems:
As the initial speech training samples are obtained by performing data mining on a dictionary, in order that the initial language training samples could cover more fields, it is necessary to increase the capacity of the dictionary, to enable the dictionary to cover more fields. This increases the difficulty of access to the dictionary, thereby increasing the difficulty of building language models.
Besides, when the initial speech training samples are expanded with the interpolation method with fixed weights for respective vertical fields, it is very difficult for the interpolated training samples to include terminology, uncommon words and unpopular words in a vertical field, so that the language model built according to the language training samples is inaccurate for producing identification results of the vertical field, thereby reducing the accuracy of speech recognition system.