Field
Apparatuses and methods consistent with exemplary embodiments relate to constructing a multilingual acoustic model, and more particularly, to constructing a multilingual acoustic model which reflects multiple languages and dialects reflecting regional characteristics of a language.
Description of the Related Art
Diverse types of electronic devices, such as smart phones and smart televisions, may provide voice recognition functionality. For example, an acoustic model which utilizes statistics-based technology may be used for voice recognition.
However, because each country or region has different linguistic characteristics, a single acoustic model may not cover voice recognition for all the languages and linguistic characteristics found in each country or region. Thus, the voice recognition technology may use a different acoustic model for each language in order to provide the functionality of voice recognition.
One way to construct an acoustic model for voice recognition is to secure sufficient data for each language. For languages used by many people, such as English, Chinese, Italian, German, and Spanish, it may be easier to acquire sufficient data, whereas for languages used by a small number of people or inaccessible languages, it may be difficult to acquire sufficient data.
Thus, an acoustic model for multiple languages or dialects may be constructed using Hidden Markov Model (HMM)/Gaussian Mixture Model (GMM)-based adaptation technology. Specifically, a seed acoustic model may be constructed using data of a language for which sufficient data exists. The seed acoustic model may be constructed using the HMM/GMM-based adaptation technology which adapts to an acoustic model of a language which is to be constructed.
However, when using this method for constructing an acoustic model for multiple languages or dialects based on the HMM/GMM, languages used for the adaptation technology have to use the same phoneme-level unit. For example, in order to acquire a British English acoustic model, an American English acoustic model may be used as a training acoustic model, but a Korean acoustic model may not be used. In addition, in order to enhance voice recognition performance when using the method for constructing an acoustic model for multiple languages or dialects based on the HMM/GMM, a large amount of data for a target language are needed to acquire the acoustic model.