The disclosures of the following priority applications are herein incorporated by reference:
Japanese Patent Application No. 11-255982 filed Sep. 9, 1999
Japanese Patent Application No. 11-255983 filed Sep. 9, 1999
Japanese Patent Application No. 11-255984 filed Sep. 9, 1999
Japanese Patent Application No. 2000-53257 filed Feb. 29, 2000
1. Field of the Invention
The present invention relates to a voice recognition apparatus and a voice recognition navigation apparatus.
2. Description of the Related Art
There are car navigation apparatuses (hereafter referred to as navigation apparatuses) that display the current position of the vehicle, display a map over a wide area or in detail and provide guidance to the driver along the traveling direction over the remaining distance to the destination in the prior art. There are also voice recognition navigation apparatuses in the prior art having a function of enabling the driver engaged in driving to issue operating instructions by voice to improve driver safety (see Japanese Laid-Open Patent Publication No. 09-292255, for instance).
The voice recognition software program used in a voice recognition navigation apparatus normally judges that a speech has ended at a point in time at which there is no longer any speech after the start of a speech and calculates the correlation values between audio data obtained up to the point in time at which there is no longer any speech after the start of the speech and all the recognition words in the recognition dictionary. Then, the recognition word achieving the largest correlation value is judged to be the recognition results. Speech that needs to be recognized by a voice recognition navigation apparatus falls into various categories of words and phrases such as navigation commands (bird""s eye view display, enlarge, reduce, etc.) used to issue instructions for various types of navigation operations, train stations, golf course names, hospital names and ski resort names.
Among these speeches, the golf course names, hospital names, ski resort names and the like tend to be longer than navigation commands and train station names, and are, therefore, extremely difficult to recognize.
In addition, the voice recognition software program normally calculates the correlation values between the audio data representing the speech made by the user (driver) after a TALK switch or the like is pressed, and the recognition words in the recognition dictionary. It then judges the recognition word achieving the largest correlation value to be the recognition results.
However, there is a problem in that the chance of erroneous recognition increases when the user starts his speech immediately after pressing the TALK switch.
Furthermore, the driver may become confused as to which instruction should be given to the navigation apparatus next and may utter a totally erroneous instruction speech. In such a case, too, the recognition word in the recognition dictionary achieving the largest correlation value is judged to be the instruction spoken by the driver and the navigation operation corresponding to that instruction is performed. For instance, let us consider a situation in which the driver, wishing to display a map, says xe2x80x9cmapxe2x80x9d when there are only three recognition words, e.g., xe2x80x9caudio,xe2x80x9d xe2x80x9ctelevisionxe2x80x9d and xe2x80x9cbird""s eye view displayxe2x80x9d provided in the recognition dictionary. In such a case, if the correlation value between the audio data and xe2x80x9ctelevisionxe2x80x9d is the largest, the navigation apparatus displays the television screen. As a result, a navigation operation other than that instructed by the driver is executed to confuse the driver.
There is another problem in that an erroneous recognition may occur if the user pronounces a given word in a slightly different manner or if the user employs an alternative expression.
A first object of the present invention is to provide a voice recognition apparatus and a voice recognition navigation apparatus capable of recognizing long speeches with ease and a high degree of reliability.
A second object of the present invention is to provide a voice recognition apparatus and a voice recognition navigation apparatus capable of achieving a successful voice recognition in a reliable manner even when a speech starts immediately after the TALK switch is pressed or when the actual pronunciation is slightly different from the standard pronunciation.
A third object of the present invention is to provide a voice recognition apparatus and a voice recognition navigation apparatus with which it is possible to ensure that none of the recognition words in the recognition dictionary is recognized if a word which is not provided in the recognition dictionary is spoken.
A fourth object of the present invention is to provide a voice recognition apparatus and a voice recognition navigation apparatus capable of achieving a successful voice recognition with a high degree of reliability even when the user pronounces part of the word or phrase in a manner slightly differently from the standard or if the user chooses an alternative word or phrase, and a recognition word generating method that may be adopted in the voice recognition apparatus and the voice recognition navigation apparatus.
Another object of the present invention is to provide a recording medium and a data signal in which data used in the apparatuses and a control program for controlling the apparatuses are provided.
In order to attain the above object, a voice recognition apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; and a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word, and the storage device stores both a first recognition word corresponding to a pronunciation of an entirety of the word to undergo voice recognition and a second recognition word corresponding to a pronunciation of only a starting portion of a predetermined length of the entirety of the word to undergo voice recognition as recognition words for the word to undergo voice recognition.
In this voice recognition apparatus, it is preferred that when the pronunciation of the entirety of the word to undergo voice recognition extends over a first predetermined length, the storage device stores the second recognition word corresponding to a pronunciation of only a starting portion of a second predetermined length of the entirety of the word to undergo voice recognition as a recognition word for the word to undergo voice recognition.
A voice recognition navigation apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; and a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word; a map information storage device that stores map information; and a control device that engages in control for providing route guidance based upon, at least, recognition results obtained by the voice recognition processing device and the map information, and the storage device stores both a first recognition word corresponding to a pronunciation of an entirety of the word to undergo voice recognition and a second recognition word corresponding to a pronunciation of only a starting portion of a predetermined length of the entirety of the word to undergo voice recognition as recognition words for the word to undergo voice recognition.
Another voice recognition apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; and a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word, and the storage device stores a plurality of recognition words each having a different pronunciation, for a single word to undergo voice recognition.
Another voice recognition apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; and a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word, and the storage device stores both a first recognition word corresponding to the pronunciation of an entirety of the word to undergo voice recognition and a second recognition word created by replacing the leading syllable in the pronunciation of the entirety of the word to undergo voice recognition with a vowel constituting the leading syllable, as recognition words for the word to undergo voice recognition.
In this voice recognition apparatus, it is preferred that: a generating device that generates the second recognition word based upon the first recognition word is further provided, and the storage device includes a first storage device and a second storage device; the first recognition word is stored in the first storage device in advance; and the second recognition word is generated by the generating device and stored in the second storage device when the voice recognition processing device performs voice recognition processing.
Another voice recognition apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; and a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word, and the storage device stores both a first recognition word corresponding to the pronunciation of an entirety of the word to undergo voice recognition and a second recognition word created by deleting a starting portion of a predetermined length of the pronunciation of the entirety of the word to undergo voice recognition, as recognition words for the word to undergo voice recognition.
In this voice recognition apparatus, it is preferred that: a generating device that generates the second recognition word based upon the first recognition word is further provided; the storage device includes a first storage device and a second storage device; the first recognition word is stored in the first storage device in advance; and the second recognition word is generated by the generating device and stored in the second storage device when the voice recognition processing device performs voice recognition processing.
Another voice recognition navigation apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word; a map information storage device that stores map information; and a control device that engages in control for providing route guidance based upon, at least, recognition results obtained by the voice recognition processing device and the map information, and the storage device stores a plurality of recognition words each having a different pronunciation, for a single word to undergo voice recognition.
Another voice recognition apparatus according to the present invention, comprises: a voice input device; a storage device that stores recognition words to be used in voice recognition processing; and a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data generated based upon the recognition words, and: the storage device stores valid recognition words corresponding to pronunciations of words to undergo voice recognition and invalid recognition words each indicating a pronunciation that is dissimilar to the pronunciations of the words to undergo voice recognition; and when the audio data obtained through the voice input device manifests a highest similarity to voice recognition data generated based upon one of the invalid recognition words, the voice recognition processing device decides that none of the words to undergo voice recognition has been recognized.
In this voice recognition apparatus, it is preferred that in case that words to undergo voice recognition are classified into a plurality of groups, words in one group are undergoing voice recognition and words in another group are not undergoing voice recognition, the invalid recognition words in the one group are created based upon valid recognition words in the other group.
Another voice recognition apparatus according to the present invention, comprises: a voice input device; a storage device that stores recognition words to be used in voice recognition processing; a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data generated based upon the recognition words; a map information storage device that stores map information; and a control device that engages in control for providing route guidance based upon, at least, voice recognition results obtained by the voice recognition processing device and the map information, and: the storage device stores valid recognition words corresponding to pronunciations of words to undergo voice recognition and invalid recognition words each indicating a pronunciation that is dissimilar to the pronunciations of the words to undergo voice recognition; and when the audio data obtained through the voice input device manifests a highest similarity to voice recognition data generated based upon one of the invalid recognition words, the voice recognition processing device decides that none of the words to undergo voice recognition has been recognized.
Another voice recognition apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; and a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word, and when a given word to undergo voice recognition includes a predetermined specific word as a part of the given word, a first recognition word created by replacing a standard pronunciation of the specific word with an alternative pronunciation of the specific word different from the standard pronunciation is stored in the storage device.
In this voice recognition apparatus, it is preferred that: the specific word is a word that is a common part of a plurality of words to undergo voice recognition; and the alternative pronunciation to the standard pronunciation of the specific word indicates how the specific word is pronounced in everyday life.
Also, it is preferred that the storage device stores both a standard recognition word containing the standard pronunciation of the specific word and the first recognition word, for the given word to undergo voice recognition. In this case, it is preferred that the alternative pronunciation to the standard pronunciation of the specific word in the first recognition word is constituted of no sound made for the specific word. Or it is preferred that the alternative pronunciation to the standard pronunciation of the specific word in the first recognition word is constituted of a pronunciation corresponding to an alternative term or an abbreviated term for the specific word.
Also, it is preferred that a generating device that generates the first recognition word and stores the first recognition word in the storage device is further provided when the voice recognition processing device performs voice recognition processing on the word to undergo voice recognition containing the specific word as a part the word to undergo voice recognition.
Another voice recognition apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; and a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word, and: if a predetermined specific word is not included in the word to undergo voice recognition, a recognition word created by adding a pronunciation of the specific word is stored in the storage device stores.
Another voice recognition navigation apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word; a map information storage device that stores map information; and a control device that engages in control for providing route guidance based upon, at least, recognition results obtained by the voice recognition processing device and the map information, and: when a given word to undergo voice recognition includes a predetermined specific word as a part of the given word, a first recognition word created by replacing a standard pronunciation of the specific word with an alternative pronunciation different from the standard pronunciation is stored in the storage device.
Another voice recognition navigation apparatus according to the present invention, comprises: a voice input device; a storage device that stores a recognition word indicating a pronunciation of a word to undergo voice recognition; a voice recognition processing device that performs voice recognition processing by comparing audio data obtained through the voice input device and voice recognition data created in correspondence to the recognition word; a map information storage device that stores map information; and a control device that engages in control for providing route guidance based upon, at least, recognition results obtained by the voice recognition processing device and the map information, and: if a predetermined specific word is not included in the word to undergo voice recognition, a recognition word created by adding a pronunciation of the specific word is stored in the storage device.
A method of recognition word generation through which recognition words indicating pronunciations of words to undergo voice recognition used to generate voice recognition data to be compared against audio data obtained through a voice input device are generated, comprises: a step in which when a given word to undergo voice recognition contains a predetermined specific word as a part of the given word, a recognition word is created by replacing a standard pronunciation of the specific word with a alternative pronunciation different from the standard pronunciation.
A recording medium according to the present invention stores data representing recognition words corresponding to a word to undergo voice recognition that is used to generate voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The data comprises: a first recognition word corresponding to a pronunciation of an is entirety of the word to undergo voice recognition; and a second recognition word corresponding to a pronunciation of only a starting portion of a predetermined length of the entirety of the word to undergo voice recognition, and both the first recognition word and the second recognition word are used as recognition words for the word to undergo voice recognition.
A data signal according to the present invention that is transmitted in a communication line comprises data representing recognition words corresponding to a word to undergo voice recognition that is used to generate a voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The data comprises: a first recognition word corresponding to a pronunciation of an entirety of the word to undergo voice recognition; and a second recognition word corresponding to a pronunciation of only a starting portion of a predetermined length of the entirety of the word to undergo voice recognition, and both the first recognition word and the second recognition word are used as recognition words for the word to undergo voice recognition.
Another recording medium according to the present invention stores data representing recognition words corresponding to a word to undergo voice recognition that is used to generate voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The data comprises: a first recognition word corresponding to a pronunciation of an entirety of the word to undergo voice recognition; and a second recognition word created by replacing a leading syllable in the pronunciation of the entirety of the word to undergo voice recognition with a vowel constituting the leading syllable, and both the first recognition word and the second recognition word are used as recognition words for the word to undergo voice recognition.
Another data signal according to the present invention transmitted in a communication line comprises data representing recognition words corresponding to a word to undergo voice recognition that is used to generate a voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The data comprises: a first recognition word corresponding to a pronunciation of an entirety of the word to undergo voice recognition; and a second recognition word created by replacing a leading syllable in the pronunciation of the entirety of the word to undergo voice recognition with a vowel constituting the leading syllable, and both the first recognition word and the second recognition word are used as recognition words for the word to undergo voice recognition.
Another recording medium according to the present invention stores data representing recognition words corresponding to a word to undergo voice recognition that is used to generate voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The data comprises: a first recognition word corresponding to a pronunciation of an entirety of the word to undergo voice recognition; and a second recognition word created by deleting a starting portion of a predetermined length of the pronunciation of the entirety of the word to undergo voice recognition, and both the first recognition word and the second recognition word are used as recognition words for the word to undergo voice recognition.
Another data signal according to the present invention transmitted in a communication line comprises data representing recognition words corresponding to a word to undergo voice recognition that is used to generate a voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The data comprises: a first recognition word corresponding to a pronunciation of an entirety of the word to undergo voice recognition; and a second recognition word created by deleting a starting portion of a predetermined length of the pronunciation of the entirety of the word to undergo voice recognition, and both the first recognition word and the second recognition word are used as recognition words for the word to undergo voice recognition.
Another recording medium according to the present invention stores a voice recognition control program. The voice recognition control program comprises: an instruction in which audio data generated based upon a voice that has been input are compared with voice recognition data generated based upon valid recognition words corresponding to words to undergo voice recognition and indicating pronunciations of the words or invalid recognition words each indicating a pronunciation dissimilar to the pronunciations of all the words to undergo voice recognition; and an instruction in which it is decided that none of the words to undergo voice recognition has been recognized if the audio data manifest a highest similarity to voice recognition data generated based upon one of the invalid recognition words as comparison results.
Another data signal according to the present invention transmitted in a communication line comprises a voice recognition control program. The voice recognition control program comprises: an instruction in which audio data generated based upon a voice that has been input are compared with voice recognition data generated based upon valid recognition words corresponding to words to undergo voice recognition and indicating pronunciations of the words or invalid recognition words each indicating a pronunciation dissimilar to the pronunciations of all the words to undergo voice recognition; and an instruction in which it is decided that none of the words to undergo voice recognition has been recognized if the audio data manifest a highest similarity to voice recognition data generated based upon one of the invalid recognition words as comparison results.
Another recording medium according to the present invention stores a recognition word generating program for generating recognition words indicating pronunciations of words to undergo voice recognition used to generate voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The recognition word generating program comprises: an instruction in which, if a given word to undergo voice recognition includes a predetermined specific word as a part of the given word, a recognition word is generated by replacing a standard pronunciation of the specific word with an alternative pronunciation of the specific word different from the standard pronunciation.
Another data signal according to the present invention transmitted in a communication line comprises a recognition word generating program for generating recognition words indicating pronunciations of words to undergo voice recognition used to generate voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The recognition word generating program comprises: an instruction in which, if a given word to undergo voice recognition includes a predetermined specific word as a part of the given word, a recognition word is generated by replacing a standard pronunciation of the specific word with an alternative pronunciation of the specific word different from the standard pronunciation.
Another recording medium according to the present invention stores data representing recognition words indicating pronunciations of words to undergo voice recognition used to create voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The data comprises, when a given word to undergo voice recognition includes a predetermined specific word as a part the given word, a recognition word created by replacing a standard pronunciation of the specific word with an alternative pronunciation of the specific word different from the standard pronunciation
Another data signal according to the present invention transmitted in a communication line comprises data representing recognition words indicating pronunciations of words to undergo voice recognition used to create voice recognition data to be compared against audio data obtained through a voice input device in voice recognition processing. The data comprises, when a given word to undergo voice recognition includes a predetermined specific word as a part the given word, a recognition word created by replacing a standard pronunciation of the specific word with an alternative pronunciation of the specific word different from the standard pronunciation