1. Technical Field
The present invention relates to a voice recognition device and a voice recognition method for recognizing voice and carrying out a response or processing corresponding to the result of the recognition, in an electronic apparatus such as a vending machine, portable terminal, or navigation system. The invention also relates to a semiconductor integrated circuit device used in such a voice recognition device.
2. Related Art
Voice recognition is a technology in which an inputted voice signal is analyzed and a feature pattern obtained as a result of the analysis is collated with standard patterns (also referred to as “template”) prepared in a voice recognition database based on a pre-recorded voice signal, thus providing a result of recognition. However, if the range of collation is not limited, there are a vast number of combinations of feature patterns and standard patterns to be compared, resulting in a fall in recognition rate.
As a related-art technique, JP-A-2011-33902 (paragraphs 0006 to 0007) discloses a portable electronic apparatus aimed at efficiently updating a recognition dictionary. This potable electronic apparatus includes: a Japanese phonetic syllabary character storage unit storing Japanese phonetic syllabary character-corresponding data in which predetermined processing and editable Japanese phonetic syllabary characters correspond to each other; a recognition dictionary storage unit which stores a recognition dictionary including choices of Japanese phonetic syllabary characters to be collated with the result of voice recognition, in association with the Japanese phonetic syllabary character-corresponding data; an execution unit which executes predetermined processing corresponding to the Japanese phonetic syllabary characters collated with the result of voice recognition; an update data storage unit which stores update data indicating a difference in Japanese phonetic syllabary characters between the Japanese phonetic syllabary character-corresponding data and the recognition dictionary; and an update unit which, when the Japanese phonetic syllabary character-corresponding data is updated, stores update data indicating the content of the update in the update data storage unit and updates the recognition dictionary based on the update data in predetermined timing. When the Japanese phonetic syllabary character-corresponding data is updated plural times before the recognition dictionary is updated, the update unit optimizes a difference for updating the recognition dictionary to the last Japanese phonetic syllabary character that is updated with respect to the predetermined processing, to one update data and stores the update data.
Meanwhile, JP-A-2005-70377 (paragraphs 0013 to 0014) discloses a voice recognition device aimed at discriminating and recognizing an unexpected sound in the way humans do without increasing the volume of processing. In this voice recognition device, a time window with a predetermined length is set in a predetermined cycle with respect to analysis target voice, and using this time window as a unit of processing, a feature amount including a frequency axis feature parameter related to the frequency of the voice and a power feature parameter related to the amplitude of the voice is extracted. Based on the extracted feature amount, the analysis target voice is recognized. In extracting the feature amount, the length of the time window for extracting the power feature parameter alone is made shorter than the length of the time window for extracting the frequency axis feature parameter alone.
These related-art techniques are common in that the techniques are aimed at efficient data processing in voice recognition. However, when a feature pattern obtained by analyzing an inputted voice signal is collated with standard patterns in the voice recognition database, there are still a vast number of combinations of patterns to be compared and therefore the recognition rate in voice recognition cannot be expected to improve.