1. Field of the Invention
The present invention relates to a noise adaptation system of speech model, a noise adaptation method, and a noise adaptation program. In particular, the present invention relates to a noise adaptation system of speech model, a noise adaptation method, and a noise adaptation program that use noisy speech to be recognized to adapt a clean speech model generated by modeling features of speech by means of a Hidden Markov Model (HMM) so that the recognition rate for the noisy environment can be improved.
2. Description of the Related Art
A tree-structure piecewise linear transformation approach is described in an article entitled “Effects of tree-structure clustering in noise adaptation using piecewise linear transformation” by Zhipeng Zhang et al. (Proceedings of 2002 Autumn Meeting of the Acoustical Society of Japan, pp. 29-30). According to the approach described in the article, noise is clustered, a tree-structure noisy speech model space is generated based on the result of the clustering, a speech feature parameter of input noisy speech to be recognized is extracted, an optimum model is selected from the tree-structure noisy speech model space, and linear transformation is applied to the selected model so as to increase the likelihood of the selected model, thereby improving the accuracy of input speech.
Another approach is described in an article entitled “Study on tree-structure clustering in noise adaptation using piecewise linear transformation” by Zhipeng Zhang et al. (2003 Spring Meeting of the Acoustical Society of Japan, pp. 37-38), in which noise characteristics are sequentially and hierarchically divided to generate a tree structure of a noise-added speech model. In this approach, noise-added speech is first clustered by signal-to-noise ratio (hereinafter abbreviated to SNR) and then a tree-structure model is provided for each SNR condition to generate a tree-structure noisy speech model space.
FIG. 6 shows an example of the tree-structure noisy speech model. In FIG. 6, a tree-structure noisy speech model is provided for each of three SNR conditions. In FIG. 6, a tree-structure model for SNR=5 dB is indicated by K1, a tree-structure model for SNR=10 dB is indicated by K2, and a tree-structure model for SNR=15 dB is indicated by K3. The top node (root) of each tree-structure model K1-K3 represents a clean speech model. Higher levels of each tree structure represent global features of noise characteristics and lower levels represent local features.
Described in Japanese Patent Laid-Open No. 2002-14692 (FIGS. 2 and 3 and Abstract, in particular) is a technology in which a large number of noise samples are clustered beforehand, acoustic models are generated on the basis of the samples, and noise selected through clustering is added to learning data, thereby enabling efficient learning with a small number of noise samples to achieve high recognition performance.
Japanese Patent Laid-Open No. 2002-91484 (Abstract, in particular) described a technology in which a language model is generated for each tree-structure cluster, which is used for speech recognition.
Japanese Patent Laid-Open No. 2000-298495 (Abstract and Claim 2, in particular) describes combining a number of tree structures to form a new tree structure.
In the approach in “Study on tree-structure clustering in noise adaptation using piecewise linear transformation” cited above, input noisy speech to be recognized is analyzed to extract a feature parameter string and an optimum model is selected from a tree-structure noisy speech model space. Linear transformation is applied to the selected optimum model to maximize the likelihood. Accordingly, this approach has a drawback that recognition involves a two-step search: an optimum model is first selected under each SNR condition and then the best model is selected from among all SNR models. Problems here are the difficulty of dealing with noisy speech with varying SNR and high costs of computing the conditions.
None of the technologies described in the above-sited documents can solve these problems.
An object of the present invention is to provide a noise adaptation system, a noise adaptation method, and a noise adaptation program for speech recognition that can readily deal with noisy speech with varying SNR and can minimize computation costs by generating a speech model with a single-tree-structure into which noise and SNR are integrated.