1. Field of the Invention
The present invention generally relates to the field of automated emotional recognition, and particularly to a distributed emotional recognition system for telecommunication networks, particularly adapted to being implemented on telecommunication terminals even of relatively low processing resources.
2. Description of the Related Art
The emotional recognition is a particular application in the artificial intelligence field. Purpose of emotional recognition is to improve the naturalness of the communication between man and machine and to enhance the efficiency of the communication means. Several researches have been developed on this theme, and the possible applications of emotional recognition are still being investigated in the present days.
An emotional recognition system is capable of identifying the emotional state of a user through the analysis of his/her speech. Normally, substantial data processing resources are needed to perform emotional recognition.
Emotional recognition systems for use in telecommunication networks are known in the art, having a distributed architecture based on the client-server paradigm, in which a user telecommunications terminal (e.g., a cellular phone, a smartphone or a Personal Digital Assistant) acts as a client, which, for the purpose of performing the emotional recognition, exploits the services provided by a server apparatus in communication therewith. The adoption of a distributed architecture allows the user telecommunication terminal, having limited data processing resources, exploiting the data processing resources of the server, the most onerous operations are performed by the server, and sophisticated services can be provided, at the same time permitting to locally manage the information at the client.
In particular, distributed architectures for the automated dialogue recognition are also known, including at least one device having a relatively low data processing capability (the client), which captures the dialogue, and, coupled thereto, a remote data processing device (the server). The adoption of such a client-server distributed architecture allows to exploit the hardware and software resources of the remote server, that are typically not available at the client, for analyzing the speech captured by the client. More particularly, the client captures the speech, generates a corresponding speech signal along with additional features extracted from the latter, and locally processes the speech, before applying coding according to the specific standard implemented by the transmission channel, in such a way to extract therefrom the best attributes that are to be used by the server for the automated dialogue recognition. The client can send the data to the server using a coding that differs from that used for coding the speech signal, so as to balance the automated recognition performances and the use of the transmission resources. If said local processing were not performed, and the analysis were instead performed by the server directly on the coded vocal signal, the automated dialogue recognition process would be negatively affected by the transmission channel coding, which is directed to the minimization of the transmission resources.
An emotion detection device and method for use in distributed systems is for example described in the published US patent application 2006/0122834. A prosody analyzer is distributed over a client/server architecture, so that the scope of emotion recognition processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, client loads etc. The partially processed prosodic data can be sent separately or combined with other speech data from the client device and streamed to a server for a real-time response. Training of the prosody analyzer with real world expected responses improves emotion modeling and the real-time identification of potential features such as emphasis, intent, attitude and semantic meaning in the speaker's utterances.