A voice recognition technology is a technology that converts voices uttered by humans into characters, codes, or the like such that terminals can recognize the voices. The voice recognition technology enables characters to be input at a faster speed than if the characters are input through typing. Hence, studies for increasing the accuracy of the voice recognition technology have been actively conducted.
Various technologies are required to enable a machine to understood a natural language and perform a natural dialog. First, speech to text (STT) for converting voices of a human into texts is preceded such that the machine and the human communicate with each other using sounds. If a voice of a user is converted into a text through STT, the input text is analyzed in various forms. It is analyzed what does the voice of the user means or which intention does the voice of the user possess. Then, if it is analyzed that the user has asked a question about a certain object, an answer desired by the user is searched using searching and semantic technologies. After that, a language generating process of finally creating the answer to the question of the user in a sentence form is performed, and the answer is delivered to the user as a voice through text to speech (TTS) contrary to STT.
However, typically, a natural language recognition processing process is performed in only a server. Hence, in order to execute a voice command for controlling a TV, the TV is always required to be associated with the server. There is a problem in that the association between the TV and the server increases the time required to execute the voice command.