A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
The invention disclosed herein relates to information retrieval systems, and particularly, to systems and methods for providing users with helpful information about the contents of chats including ongoing on-line chats.
Real-time textual conversations, commonly known as chats, have become increasingly popular among both personal and business computer users. Chats occur as conversations between two people, as conferences among larger groups, and in persistent chat rooms or spaces accessible to a larger community who can drop in, read what was recently written, and contribute if they desire. Chats are widely available over local and wide area networks, and are particularly popular among users of on-line services and the Internet.
The textual nature of chat makes it particularly valuable in some settings. Chat can be conducted while people are on the phone, allowing it to be used as a second channel for exchanging information. Because of the persistent nature of text, a user can catch up on anything that was said in a chat if they were momentarily distracted or interrupted. Chat can be an inexpensive and lightweight way for people to exchange information in real time. These and other reasons contribute to the growing use of chat in business settings and the increasing incorporation of chat into the offerings of major software manufacturers.
Chats frequently contain important information that users will want to access at a later time. This can include specific details, such as a phone number or address, lists of tasks the user must remember to perform, and broader discussions and ideas. While mechanisms have been designed to allow users to save the transcript of a chat session for later retrieval, these identify the saved transcripts only by such details as the date, time and/or participants in the chat or require the user to manually assign a single name to the conversation. They do not provide an automatic and convenient way for transcripts to be identified by the topics they cover.
Because of the conversational and often informal nature of chat, a single conversation can concern a number of topics, intertwined temporally and frequently shifting from one topic to another. A person presented with a chat transcript, both when retrieving a past transcript and joining a conversation in progress, must scan through the entire transcript to know what was discussed or to find a topic of interest.
In addition, while some existing systems give others awareness that people are involved in a currently occurring conversation in which they could participate, they do not inform them of the specific topics being discussed. The user must access the chat transcript and read through it to determine if an issue of interest is being discussed.
There is therefore a need for systems and methods for allowing users to quickly determine the contents of a chat and to monitor the progress of ongoing chats and the topics being discussed therein.
It is an object of the present invention to solve the problems described above with existing chat systems.
It is another object of the present invention to automatically label a chat transcript by the topics it includes.
It is another object of the present invention to allow users to easily locate the portions of a chat transcript dealing with a specific topic.
It is another object of the present invention to allow users to easily discern the topic under current discussion in an ongoing chat without monitoring the complete text.
It is another object of the present invention to automatically notify potential chat participants when topics of interest to them are under discussion.
It is another object of the present invention to allow users to easily determine the topics discussed in a chat transcript.
It is another object of the present invention to automatically categorize and topically index the contents of a chat session through statistical analysis of its contents.
The above and other objects are achieved by a method for informing a user of topics of discussion in a recorded chat between two or more people. The method includes the steps of identifying elements from the chat having similar content, labeling some or all of the identified elements as topics, and presenting the topics to the user. In some embodiments, identifying elements from the chat having similar content includes the steps of decomposing the chat into a plurality of utterances made by the people involved in the chat and clustering the utterances to identify elements in the utterances having similar content.
Furthermore, each decomposed utterance is parsed into one or more tokens and represented as a vector comprising a combination of some or all of the one or more tokens. In the case of a previously recorded chat which is no longer ongoing, some of the tokens in the utterance may be removed before representing the utterance as a vector. The tokens removed include tokens appearing in a percentage of all utterances in the chat which is below a low percentage or above a high percentage. In the case of an ongoing chat, in which such percentages cannot be determined because the full chat record is not yet available, all tokens in the utterance are included in the vector. The tokens in each vector are weighted by frequency of their occurrence in the utterance or chat as a whole, and a vector-space model is generated from all the vectors.
Standard clustering techniques are used to cluster the utterances based on the vector space model created from the vectors and tokens. In the case of a previously recorded chat, clustering is performed on each utterance. In the case of ongoing chats, clustering is performed in accordance with a process which accounts for the dynamically changing nature of the chat content. The process involves receiving a first set of ongoing chat data from the ongoing chat, decomposing the first set of ongoing chat data into a plurality of first utterances, and, when a first number of first utterances has been received, clustering the first utterances to generate a plurality of first clusters. As the chat continues, a second set of ongoing chat data is received and decomposed into a plurality of second utterances, which utterances are clustered into the first clusters when a second number of second utterances has been received. A new cluster is performed under certain conditions by breaking the largest of the existing cluster into two or more smaller clusters. The result is an ever changing collection of topics representing the subject matter under discussion in the chat.
Salient phrases and keywords are automatically extracted from the topics for use in labeling the chat transcript and creating a dynamic listing of the topics it contains. The listing serves as an active table-of-contents, allowing users to easily access the portions of the chat transcript it references. A variety of textual and graphical displays may be used to provide an overview at a glance of the locations in a chat transcript in which each of its topics was discussed. The keywords identifying the topic under current discussion in an ongoing chat can be displayed, allowing users to decide at a glance if they wish to participate. A number of chat conversations may be automatically monitored, and users notified when a topic in which they have previously, through deliberate setting or observed actions, indicated an interest.