The present invention relates to a system and methods for automatic voice message processing and in particular to a system and method for processing voice messages so as to convert the voice mail to e-mail and prioritise the e-mail according to acoustic features in the voice mail.
It is known to categorise an e-mail according to keywords recognised from the text of the e-mail. The e-mail may then be displayed in specific categories within an e-mail inbox. This way a receiver can see which emails are urgent, important, or confidential at first glance and prioritise subsequent reading and actions accordingly.
It is also known for a telephony system to perform voice recognition translation on the voice signal and analyse the translated text for the purpose of categorising the voice message.
European Patent publication number 0935378 discloses a programmable automatic call and data transfer processing system which indexes or prioritises incoming telephone calls, facsimiles and emails based on the identity of the caller or author, the subject matter of the message or request, and/or the time of day. Such a system is embodied in IBM Mail Analyzer which is intended to interface with an e-mail system (such as Lotus Notes) that processes text-based documents and provides text categorisation technology. IBM Mail Analyzer is part of a suite of software focusing on customer relationship management which also includes IBM DirectTalk interactive voice recognition system, IBM DirectTalkMail voice messaging system, and IBM CallPath telephone call centre system.
Performing speech to text conversion on a voice message and then categorising the text has its problems. A keyword for the categorisation may not be present in the text if the speaker was in too much of a hurry when leaving a message, for instance if it was urgent or important. If the speaker talked too quickly or does not match the speech pattern word vocabulary then the keyword may not be recognised.
According to one aspect of the invention there is provided a method of processing a voice message within a voice message system comprising: receiving a voice message; determining a characteristic associated with the acoustic delivery of the voice message; determining a category based on characteristic; associating the category with the voice message; and prioritising the voice message along with other similarly categorised voice messages according to their respective categories.
It is not known to categorise a voice message based on the way in which a voice message is spoken or delivered by a caller. Normally the categorisation is determined by the content of the voice message. Although the prior art does use acoustic properties to finally determine the text on which the prior art categorisation is performed it is the text on which the categorisation is ultimately based and not on a property of the voice message itself. The text of the message is derived from a multistage process including: calculating the frequency of the nodes of the signal by sampling the signal; determining the phonemes from the nodes using frequency analysis; and determining the text from the phonemes using Hidden Markov Modelling. Finally the text of the message is scanned to acquire certain key words and the message is categorised according to the located keywords.
One such characteristic of delivery is the rate of delivery of the words in the voice message. A caller may leave a very hurried message because of the urgency or importance of the matter. The caller may forget to mention that the matter is urgent or important but will have left enough clues in the message for it to be categorised as such. The level of volume of the message is another characteristic. A stressed or irate caller may raise his voice when leaving a message and such a characteristic can be used to categorise the message as important or urgent.
The rate of delivery of the message is the number of words in the message divided by the time taken to speak the message. The number of words is determined by counting the number of unvoiced segments in the voice signal. Alternatively, if the message is converted into text the number of words may be counted from the text. The voice message may be timed by the IVR system to find its length (in seconds) or alternatively the size of the message is taken to be in proportion to time needed to record it and an appropriate algorithm calculates this. The size of the message can be determined from the number of data words needed to store it.
Preferably the method further comprises: storing the voice message and category in a group with other voice messages and categories; and defining a play order for the group of voice mail messages depending on their respective associated categories. In this way voice messages which were deemed urgent would be played first instead of playing the voice messages in received order.
The method may advantageously be combined with e-mail messages whereby notification of the categorised voice message is sent to an e-mail system or other messaging system and the notification is prioritised with similarly categorised mails. More advantageously the voice message is converted into a text message and sent as a complete e-mail with associated category whereby the converted voice message is prioritised with similarly categorised e-mails.
Advantageously the characteristic is representative of the urgency of the message and the voice message is categorised according to the urgency as determined from the acoustic characteristic.
Alternatively the characteristic is representative of the importance of the message and the voice message is categorised according to the importance of the message.
The characteristic may be representative for the whole voice message or part of a message. For instance the speed of delivery may be estimated from the first part of the voice message rather than the whole message.