1. Field of the Invention
The present invention relates generally to a voice recognizing method and apparatus by using video recognition in a terminal, and more particularly, to a method and apparatus for detecting a speech start time and a speech end time through video recognition by a camera without a separate user gesture, and increasing accuracy of voice recognition.
2. Description of the Related Art
Voice recognition technology, when substituted for physical input, aims to enable a user to conveniently use electronic devices without movement. For example, the voice recognition technology may be implemented in various electronic devices such as a smart phone, a television, and a vehicle navigation device.
FIG. 1 illustrates a display screen of a terminal for recognizing voice according to the related art. The voice recognition technology in FIG. 1 requires a user to record start, speak, record end, and perform result computation by operating a specific program. The related art shown in FIG. 1 is implemented by a pre-defined key word or a structure for general free voice recognition, rather than a technology of processing a command according to a current state of a device.
The voice recognition technology statistically analyzes and classifies input voice. It is important to minimize noise or silent section of recorded data for exact voice recognition. However, when considering various situations of a user recording voice, a noise other than a speaker's voice is likely to be included in voice recording data, and it is difficult to exactly recognize a voice speaking state.
The user needs a separate operation to start voice recording. For example, when the user is driving a vehicle or carries a burden may be considered. Voice recognition is a very valuable function in such instances, since a terminal function may be executed by using only a user's voice without separate key or gesture input. Accordingly, there is a need in the art for a voice recognition technology that does not require separate gesture user input from the start of voice recording.