1. Field of the Invention
The present invention relates to a method and apparatus for performing a speech segmentation upon an unknown speech signal using a known speech signal.
2. Description of the Related Art
In a first prior art speech segmentation apparatus, a feature parameter is extracted from an input speech signal. Then, segmentation points of speech elements are determined by detecting changes in the feature parameter (see: JP-A-64-44492).
In the above-described first prior art speech segmentation apparatus, however, if each change of the feature parameter is small, it is impossible to detect such a change, so that the performance of determination of segmentation points deteriorates.
In a second prior art speech segmentation apparatus, if the sequence of speech elements of an input speech signal is known, segmentation points of the speech elements of the input speech signal are determined by visually comparing the feature parameter thereof with that of a known speech signal.
In the above-described second prior art speech segmentation apparatus, however, since the determination of segmentation points is visually carried out, the cost therefor is increased. Also, if a paused interval is included in the known speech signal, an input speech signal is also generated by introducing a paused interval in a voice, which would trouble a person who speaks to generate the voice. Further, since some vowels are easily caused to be silent, a person has to be careful in speaking such vowels to make them correspond to those of the known speech signal.
In a third prior art speech segmentation apparatus, the segmentation points of speech elements are automatically determined by using a hidden Markov model (HMM).
In the above-described third prior art speech segmentation apparatus, however, since the time limitation is loose, the accuracy of segmentation points around boundaries between speech elements is low. Although the accuracy of segmentation points can be enhanced by learning a highly accurate reference speech signal of a specific person, such a highly-accurate reference speech signal is not easy to obtain.