The present invention relates to a speech recognition apparatus and a recording medium having a speech recognition program recorded therein. More particularly, this invention is concerned with a speech recognition apparatus for recognizing voice data, and a recording medium in which a speech recognition program causing a computer to recognize voice data is recorded.
In recent years, research and development of speech recognition technology has been undertaken in earnest. A technological means capable of recognizing voice in real time has been proposed. This kind of technology has been adapted to various kinds of products or usages, for example, reservation of tickets by telephone or voice commanding within car navigation.
Along with a recent breakthrough in speech recognition technology and improvement in performance of personal computers, a technology for documenting voice input through a microphone connected to a personal computer by recognizing speech within application software running in the personal computer, and displaying the document has been developed.
An example of a software package enabling speech recognition is a product xe2x80x9cVoice Type 3.0 for Windows 95xe2x80x9d released recently by IBM Ltd. This product converts voice input through a microphone into text data in real time and enjoys a considerably high recognition ratio.
However, the application software permits real-time input through a microphone that is only one means for inputting voice data. An already existent voice file cannot be recognized directly.
One object of development of the aforesaid speech recognition technology is to realize a so-called speech word processor or a dictation system for automatically creating a document on the basis of voice data input by performing dictation, and displaying the document in a screen or the like.
A conventionally adopted means is such that when the contents of a document to be created are dictated and temporarily recorded by a recording apparatus such as a tape recorder, and a secretary, typist, or the like reproduces the dictated contents and documents them using a documentation apparatus such as a type writer, word processor, or the like. This style has been generally adopted as one form of effective utilization of the recording apparatus such as a tape recorder.
As for such dictational recording, a technique of appending an index mark or end mark to voice data so as to give instructions to a secretary or typist has been known in the past. According to a prior art of appending such a mark, a desired region of voice data is not designated as an interval but a specified region of voice data is designated as a point.
In the foregoing form of utilization in which a recording apparatus is used for dictation, the birth of a technology for automatically converting the contents of a record into a document has been greatly demanded in the past.
In actual dictation, a word irrelevant to contents to be informed may be contained. For example, when written sentences are recited, an incorrectly uttered word or a word having no meaning such as xe2x80x9cAhxe2x80x9d or xe2x80x9cWellxe2x80x9d (hereinafter an unnecessary word) may be contained (frequently in some cases).
In this case, the performance of speech recognition deteriorates. This leads to a drawback that a document displayed in a screen contains many mistakes. A technology for constructing a dictation system by taking account of the above unnecessary words and creating language models that cover all words including the unnecessary words and that are intended to be used for speech recognition has been proposed in the past.
For example, according to Japanese Unexamined Patent Publication No. 7-5893, there is provided a speech recognition apparatus comprising: a standard pattern memory means for storing standard patterns; an unnecessary word pattern memory means for storing patterns of unnecessary words; a word spotting means for spotting as a word or word-spotting a standard pattern stored in the standard pattern memory means or a pattern of an unnecessary word stored in the unnecessary word pattern memory means on the basis of input voice, and outputting a corresponding interval and score; a producing means for hypothesizing the contents of uttered voice and producing a representation of the meaning; and an analyzing means for analyzing the result of word-spotting, which is performed by the word spotting means, on the basis of the representation of the meaning of the hypothesis produced by the producing means. The analyzing means allocates a score resulting from word-spotting performed on the pattern of an unnecessary word to remaining intervals, of which corresponding standard patterns or patterns of an unnecessary word have not been word-spotted, among all the intervals of data items constituting the voice. The result of word-spotting performed by the word spotting means is then analyzed.
However, the speech recognition apparatus described in the Japanese Unexamined Patent Publication No. 7-5893 has difficulty in carrying out practical processing within an existing computer (especially a computer of a personal level) because the data size of language models becomes enormous.
Using a currently commercialized product, a speaker must be careful in not uttering an unnecessary word or the like and cannot therefore help feeling clumsiness.
For improving the performance of speech recognition, it is required that the sound level of input voice is proper. Currently, it is hard to guarantee a high recognition ratio over a wide range of sound levels from a low level to a high level. A system is therefore designed to provide a maximum recognition ratio relative to an average sound level of voice.
In a speech recognition apparatus of a mode in which voice is input through a microphone as mentioned above, a sound-level meter for indicating a sound level of voice is displayed in, for example, a screen or the like so that a speaker himself/herself can manage his/her sound level of voice properly.
As an example of an embodiment of this technology, a sound pressure level display for a speech recognition apparatus comprising a first sound receiver for receiving a voice signal, a second sound receiver for receiving a noise whose level is close to that of the voice signal received by the first sound receiver, a sound pressure level ratio calculating means for calculating a ratio of a sound pressure level of a voice signal input to the first sound receiver to a ratio of a sound pressure level of a noise input to the second sound receiver, and a display means for displaying the ratio of sound pressure levels calculated by the sound pressure level ratio calculating means is described in Japanese Unexamined Patent Publication No. 5-231922.
However, it is annoying for a speaker to manage his/her own voice so that the sound level will become proper. There is therefore an increasing demand for a user-friendly speech recognition apparatus. Moreover, since the sound level of input voice cannot be detected using already recorded voice data, the technology disclosed in the Japanese Unexamined Patent Publication No. 5-231922 cannot be adapted as it is. It cannot be judged whether or not the sound level of voice data is suitable for speech recognition. Besides, since the sound pressure level display is not provided with a facility for adjusting a sound level of voice autonomously, a voice recognition ratio may vary abruptly depending on a sound level indicated by recorded voice data.
A first object of the present invention is to provide a speech recognition apparatus for recognizing speech represented by voice data recorded in a given recording medium and a recording medium in which a speech recognition program is recorded.
A second object of the present invention is to provide a speech recognition apparatus capable of treating an unnecessary word or the like contained in voice without the need of especially fast processing, and a recording medium in which a speech recognition program is recorded.
A third object of the present invention is to provide a speech recognition apparatus capable of recognizing speech on a stable basis irrespective of a sound level indicated by recorded voice data, and a recording medium in which a speech recognition program is recorded.
Briefly, a speech recognition apparatus in accordance with the present invention for recognizing speech within a programmed computer comprises a voice data reading means for reading voice data from a voice data recording medium in which the voice data is recorded, a speech recognizing means for recognizing speech represented by the voice data so as to convert the voice data into text data, and a display means-for displaying the text data.
A recording medium in accordance with the present invention having a speech recognition program recorded therein is used to run the speech recognition program in a computer, whereby the speech recognition program causes the computer to read voice data from a voice data recording medium in which the voice data is recorded, recognize speech represented by the voice data so as to convert the voice data into text data, and display the text data.
These as well as other objects and advantages of the present invention will become further apparent from the following detailed explanation.