Speech recognition refers to technology of analyzing, by a computer, voice of a user input through a microphone, extracting features, recognizing a result of approximating previously input words or sentences as a command, and performing an operation corresponding to the recognized command.
A conventional speech recognition system has separately used, according to service purpose, an embedded speech recognition scheme in which a speech recognition engine is embedded in the interior of a terminal such as a vehicle or a mobile device and a cloud based server speech recognition scheme for Internet voice search through a smartphone and various information processing.
Hybrid speech recognition technology, which can use an advantage of a high recognition rate of embedded speech recognition based on a recognition grammar and an advantage of server speech recognition based on sentence unit recognition, has been applied to the market.
For hybrid speech recognition, two or more result values may be received by simultaneously driving an embedded speech recognition engine and a server speech recognition engine with respect to one utterance of a user and an arbitration algorithm, which uses the better one of the two or more result values to execute a command, serves a core role in hybrid speech recognition.
A result of embedded speech recognition is usually generated as a word, a result of server speech recognition is usually generated as a sentence, and a result of a language understanding module is usually generated as an intention and one or more object slots. In this way, different types of results are variably derived according to a situation. Accordingly, a conventional speech recognition evaluation system has a difficulty in evaluating hybrid speech recognition.
A conventional automatic speech recognition test has usually not considered an actual vehicle test environment. That is, a batch scheme, in which a speech recognition system is installed in a personal computer and results obtained by automatically inputting recognition target vocabularies in the computer are collected, and a volume arbitration scheme, in which the ratio of noise to voice is automatically adjusted when a test environment is provided, have focused upon technology of the conventional speech recognition test.
However, since recent speech recognition requires integrated performance verification for a hybrid scheme in which embedded speech recognition and cloud-based server speech recognition, which are different in specifications of recognition results, are simultaneously driven, an algorithm capable of incorporating and analyzing results of different specifications and a method of operating the algorithm are needed.
In particular, a conventional automatic evaluation system for speech recognition has developed into an automatic voice database (DB) output device for measuring a speech recognition rate or an arbitration device for arbitrating a noise environment.
However, since recent speech recognition requires integrated performance verification for a hybrid scheme in which embedded speech recognition and cloud-based server speech recognition, which are different in specifications of recognition results, are simultaneously driven, an algorithm capable of incorporating and analyzing results of different specifications and a method of operating the algorithm are needed.
For example, in the case of a vehicle speech recognition system, multilingual native speakers are directly boarded in a vehicle in a high-speed traveling environment for an actual vehicle test and are directed to utter determined commands. Then, a checker who is also boarded in the vehicle manually checks a recognition result.
However, such a vehicle test scheme encounters various problems regarding casting a few hundred native speakers, guiding the native speakers to a test place, managing the native speakers, safety according to a high-speed traveling situation, deterioration of efficiency according to manual recording of the recognition result, much time consumption in refining and analyzing enormous volumes of results, and impossibility of a repetitive test. Accordingly, it is difficult to perform enough tests to calculate meaningful statistical results and thus technology for solving this problem is needed.