1. Technical Field
The present invention generally relates to voice recognition and, more particularly, to technology related to a voice recognition terminal including an acoustic model, a server for performing voice recognition, and a voice recognition method using the voice recognition terminal.
2. Description of the Related Art
Generally, voice recognition (or speech recognition) refers to technology for interpreting voice signals and combining the voice signals with patterned data stored in a database (DB), thus converting the voice signals into character strings or identifying linguistic semantic relationships. Voice recognition is performed in units of characters. Alternatively, when there are various relationships between the spelling and, pronunciation of characters, voice recognition must be performed in units of at least words.
When the word or a sentence on which voice recognition is to be performed is set in advance and only the set word or sentence is provided to a voice recognition device, voice recognition is relatively simply performed. However, technology for performing voice recognition on normal sentences or conversation requires relatively high technical skills due to the ambiguity and variety of natural language.
Voice recognition technology is configured such that a voice recognition device analyzes an input voice signal, extracts features from the voice signal, measures similarities between the input voice signal and previously collected voice models stored in a voice model DB, and converts the voice model most similar to the input voice signal into characters or instructions. Voice recognition technology is a kind of pattern recognition procedure, and tones, pronunciation and accents differ from each other between, persons. Thus, conventional voice recognition technology collects voice data from as many people as possible, extracts common characteristics from the voice data, and then generates reference patterns.
Voice recognition technologies based on speakers to be recognized may be classified into a speaker-independent recognition method, a speaker-adaptive recognition method, and a speaker-dependent recognition method. First, the speaker-independent recognition method enables the speech of any speaker to be recognized, and is configured to extract information in advance about the voices of various speakers and arrange the extracted information in a DB, thus being usable without requiring, a separate training procedure. Further, the speaker-adaptive recognition method adapts a speaker-independent recognition device to a user's voice in order for the user to improve the rate of recognition of his or her voice.
Furthermore, the speaker-dependent recognition method requires a procedure that allows a specific speaker or user to train a recognition device with his or her voice. The voice recognition device to which the speaker-dependent recognition method is applied can recognize only voices for which it has been trained. Since the speaker-dependent recognition method is implemented relatively simply, it has been installed in and applied to various types of terminals, but it is inconvenient in that the user must, undergo a training procedure.
Recently, research into technology for incorporating personalized characteristics into conventional voice recognition methods has been conducted. When a voice recognition system is implemented in the form of a terminal, a personalization task such as adaptation to speakers may be performed in conformity with respective personalized terminals. However, there is a disadvantage in that it is difficult to implement a voice recognition device for accommodating a large-vocabulary language model. Meanwhile, an online voice recognition method involving communication with a voice recognition server can accommodate a large-vocabulary language model, but there is the burden of separately storing personalized information in a voice recognition server.
Thus, as voice recognition technology for reflecting personalized characteristics while reducing the burden of storage on a server, technology in which a voice recognizer for high-frequency vocabulary is installed in a terminal has been developed. Further, technology for allowing a terminal to perform phoneme recognition and transmit recognized phonemes to a server and for allowing the server to perform voice recognition has been proposed. This technology is advantageous in that speaker adaptation can be performed via the voice recognition method, but is disadvantageous in that two voice recognition systems including a language network must be provided in the terminal and the server, respectively, and there is a procedural restriction in that two-stage voice recognition must be performed by the terminal and the server.
Therefore, there is required technology in which both the terminal and, the server, which perform voice recognition, share their roles with each other, thus simplifying the voice recognition procedure while supporting personalized voice recognition.
In connection with this, Korean Patent Application Publication No. 10-2011-0133739 discloses a technology related to “System and, method of Multi model adaptive and voice recognition.”