A voice recognition sensor means a sensor which extracts and recognizes linguistic information from acoustic information included in a human voice and makes a response thereto. In these days where easy and convenient natural user interfaces (UI) are available, voice conversation is regarded as the most natural and convenient way among information exchange mediums between human and machines in the future IoT era. However, in order to make voice conversation with a machine, the human voice should be converted into a format which the machine is able to process, and thus process is voice recognition.
Voice recognition, represented by Siri of Apple, is configured as a combination of a microphone, an ADC (Analog to Digital Converter) and a DSP (Digital Signal Processing), but since the voice recognition consumes much power if it is always in a standby state, a user manipulate this function by pressing a start button and an end button. This is one of difficult problems in implementing a true voice recognition-based IoT (Internet of Things). Also, if an ultra-low power constant-operating voice recognition system is developed, it is expected to open inexhaustible IoT applications.
A voice recognition system which is easily useable without any separate learning or training is a promising technology leading the future industries in the IoT era where the demand on development and construction of UI for innovative next-generation IT products is increasing. The voice recognition system allows a user to input data even though the user does not have a free hand or is moving, and also information can be processed rapidly or in real time since the data can be input faster than typing.
Recently, owing to the evolution of performance of smart phone terminals, the development of artificial intelligence and knowledge search techniques and the bulk data processing using a cloud-based voice recognition system, an answer desired by a user can be accurately and rapidly found using an intelligent agent, but in spite of such advantages and possibilities, the voice recognition technology still has the following limits.
First, in view of hardware, the existing voice recognition technique using a combination of a microphone, an ADC and a DSP consumes very large power, and thus the voice recognition is actually not in a standby state continuously without a separate charger. Further, it is very restrictive to apply the voice recognition to a mobile voice recognition sensor. In addition, a preliminary operation such as pressing a voice recognition start button is required, and its accuracy, reliability and speed are deteriorated. In other words, in order to apply the voice recognition to IoT-based smart phones, TV, vehicles and other wearable devices, high sensitivity is essential, and even in a sleep state, the standby state should be maintained consistently without large power consumption, so that user voice may be recognized just with ultra-low power.
Next, in acoustic and linguistic views, the existing voice recognition technique using a combination of a microphone, an ADC and a DSP is based on a complicated algorithm and thus has a limit in recognizing natural conversational tones.
However, the cochlea of human efficiently processes signals of a complicated language through a simple algorithm after separating frequencies. Even though such a cochlea principle is applied to various devices, this has not yet been utilized as an ultra-low power voice recognition sensor for IoT, except for the case where this is copied as an artificial cochlea.
A flexible piezoelectric thin film was applied as an artificial cochlea as disclosed by H. Lee et al. in the paper of Advanced Functional Materials, Vol. 24, No. 44, p. 6914, 2014. Here, three piezoelectric elements are attached to a thin trapezoidal silicon membrane to separate voice signals in an audible frequency band depending on frequencies. In this paper, three individual piezoelectric elements are attached onto a silicon membrane to separate frequencies and then applied to an artificial cochlea, but this has not considered an algorithm and a circuit design as an ultra-low power voice recognition sensor for IoT.
In addition, Korean unexamined patent publication No. 10-2012-0099036 (Sep. 6, 2012) proposes a piezoelectric device capable of outputting a haptic feedback effect using a plurality of resonant frequencies. Meanwhile, even though this document provides a haptic feedback technique based on tactual sense, force, kinesthetic sense or the like, there is no disclosure in relation to a method for recognizing voice signals after a recognized voice is separated into a plurality of frequencies.
(Paper) H. Lee et al., Advanced Functional Materials, 24(44), 6914, 2014
(Patent Literature 1) KR10-2012-0099036 A