As the Internet continues to experience explosive growth, an ever increasing amount of information is available through collections of rich text documents. Ranging from digital libraries to online medical references, these collections contain a wealth of multi-facet interconnected data. To navigate through this rich text data, most people now rely on search technologies to find relevant information. Search tools typically return a ranked list of documents whose content is highly related to a set of user-supplied keywords. This model has proven remarkably powerful for information retrieval tasks, such as locating the address of a restaurant.
Ranked lists, however, are insufficient for more complex data exploration and analytic tasks where users try to understand an overall document corpus or relationships between complex concepts that span across multiple documents. The effective organization and presentation of search retrieval results is still largely an open problem. This problem becomes even more challenging when considering the multi-facet nature of many documents. For instance, consider an online library of health-related articles such as Google Health. Each article in the library describes a specific disease and contains information about a number of different facets: symptom, treatment, cause, diagnosis, prognosis, and prevention. A search engine allows users to find a page describing a specific disease, and links allow users to navigate to a small set of predefined related pages. It remains difficult, however, to answer basic self-care questions. For example, a user may desire to identify the general classes of diseases leading to the symptoms that the user is experiencing, or to identify those diseases that have a similar prognosis.
These questions require an understanding of complex correlations across documents and across multiple facets of the contained information. To answer these questions, users need to examine both high-level overviews and fine-grained local-level relationships. For instance, a user in the above scenario would need to both explore clusters of related diseases and uncover pairwise relationships based on specific facets of information such as prognosis and treatment.
Information visualization technologies, when used in conjunction with data mining and text analysis tools, can be of great value for these sorts of tasks. For this reason, several visualizations have been designed for either high-level corpora summarizations or low-level structure analysis. Although many existing visualization techniques provide valuable insights into the visualized data, none of them offers a complete solution. In particular, existing visualization techniques do not provide: (1) interactive visualization of local data relationships within the context of global document patterns, (2) dynamic context control so that users can pivot between different facets of information, or (3) an integrated approach to multi-faceted search and visualization.
A need therefore exists for improved interactive visualization techniques that enable users to navigate and analyze large multi-faceted text corpora with complex cross-document relationships.