Speech recognition (Speech Recognition) is processing by which speech language which a person speaks is analyzed by computers, and the content to be spoken is taken out as character data. In the case of speech recognition for Japanese-language conversion, for example, when “Hello” is pronounced, the content to be spoken can be converted to the character strings corresponding to “Hello”.
Incidentally, in the case of face-to-face talks, when “Hello” is pronounced, the emotions of a speaker can be transmitted to a counterpart by the speaker's expression or intonation of the voice of the speaker. However, in the case of the speech recognition, the emotions cannot be transmitted due to mere character strings. Accordingly, the following words “I'm fine” and the like are required to be added, in order to transmit the emotions to a reader who reads the character strings, which complicates the content to be spoken and makes it likely to lead to an error in the speech recognition.
As a method of transmitting the emotions without incurring the complication of the content to be spoken, “embellishment” is included. The typical embellishment is an emoticon. For example, when character strings which seem as if it were a smiling face ((^-^); also referred to as smiley) are provided after the character strings of “Hello”, the emotion of the speaker (joy corresponding to vigor in this case) can be transmitted to the reader.
When this embellishment is applied to the speech recognition, for example, it is conceivable that “smiling face” is pronounced, and the voice to be pronounced is recognized, and the corresponding embellishment (smiley in this case) is provided.
However, regarding this method, it is necessary to register voice data for collation to recognize respective embellishments in advance. There is a drawback in that the capacity of voice data for collation increases as the number of types of embellishments increases, which require more memory space. Moreover, it is necessary for the user to remember the vocalization corresponding to the voice data for collation, which is a drawback in that poor usability is provided.
Accordingly, there has been demanded an embellishment input technology which does not cause the deletion of memory space and has high usability.
From this background, for example, Patent Document 1 below discloses a technology in which, when voice is recognized and converted into character strings, an emotion involved in the voice is assumed, and embellishments such as the pictograph, which represents the emotion, are added to the character strings. Similarly, Patent Document 2 discloses a technology in which the eagerness or the emotion of a person who inputs characters is assumed based on keystroke speeds, keystroke intensity, and keystroke frequency at the time of inputting the characters, and modification information such as emoticon corresponding to assumption results is added to the character strings. Similarly, Patent Document 3 discloses a technology in which an e-mail transmission apparatus detects vibration of its own and transmits e-mail in which vibration information is added, and an e-mail reception apparatus generates the vibration having intensity corresponding to the vibration information when the e-mail reception apparatus regenerates the e-mail. Similarly, Patent Document 4 discloses a technology in which the displacement patterns of a cellular phone apparatus (for example, pushing down the cellular phone apparatus forwardly, drawing in a circle with the cellular phone apparatus, and shaking the cellular phone apparatus laterally) are detected, and e-mail auxiliary input information (a short sentence, a sample sentence and the like) corresponding to the displacement patterns to be detected is listed and displayed.