Widespread digitization has led to an explosion of data, a large volume of which is in the form of unstructured text. Organizations all over the world have become aware of the tremendous potential of text data analytics technologies that can help them understand and serve their customers better through the analysis of consumer generated text which includes both personal and business communications.
In the present scenario, text data analytics is successfully employed by advertisers and market researchers to gain insights about target consumers. However, while these applications do provide insights based on analysis of large volumes of static text data which has been amassed from the past, it is still a challenge for the analysts to understand how the data had changed over time.
Existing prior art illustrating large volumes of static text analysis, particularly email analysis and visualization, has focussed on multiple dimensions of email communication such as social-networks within emails, thread-based communication and temporality using current or archived emails. It visualizes dyadic communications between the mailbox owner and his/her contacts in a temporal order summarizing conversations and the differences between conversations with different contacts.
Dynamically coordinated email analysis and visualisation as proposed in the existing art allows various attributes of emails to be visualized in a chronological manner. A user selects email folders to be visualized and is presented with views of email attributes such as sender and date and the position of the email in a time of day plot and in a folder-wise daily frequency plot. Users can select emails using filters over attributes to observe patterns. Visualizing patterns of correspondence through temporal rhythms to understand relationships with contacts too has been proposed in the prior art. However, such prior art does not disclose analysis of actual content of emails.
Further, prior art also illustrate a system for browsing email archives that evoke memories, which provides interactive visualizations of communication with inferred social groups, recurring named entities, sentimental cues and image attachments. While summarizing, work progress has been reported as an additional feature in the prior art. However, prevalent text analytics methodology and visualization techniques described in prior arts are unable to demonstrate evolution of content of general email inboxes. Temporal visualization of social media platforms such as Twitter as stacked tag clouds has been illustrated in the existing art. The system described therein is populated with a set of buckets with interesting keywords and tweets are filtered based on such set of buckets. Words in the tag cloud are sized according to frequency and coloured according to sentiment. However, such prior art does not address newly emerging topics, which are not covered by one of the pre-configured buckets. Further, prior art fails to compare between topics that are covered by one of buckets.
Thereby, analysis of large volume of text and clustering of said data to demonstrate and visualize how the content of the data has evolved is still considered as one of the biggest challenges of the technical domain.