Many aspects of the analyst's task of finding useful information among massive data are supported by advanced search engines but which end up using a display of results that list a very few documents (i.e. 10 to 20). This is true for analysis related to competitive intelligence, issues monitoring, financial industry compliance investigations and media awareness to name a few. Required are interactive, information visualization techniques that are tightly coupled with massive data, software agents, search engines and the analyst's exploration task.
Executing queries, sequentially scanning results, opening and reading documents is a common workflow. Queries are often iteratively refined, can become quite complex, or be freshly developed and established as new thoughts are followed like trails. Results are scanned for when written, source, buzzwords, keywords, corroborating evidence, new items, trustworthy baseline document, summaries, relevance, etc. The nature of the data is varied and voluminous. The amount of information available is quickly escalating in quantity. People feel overwhelmed working now with just hundreds of items such as observations, reports and events, but if analysts were able to work with thousands, tens of thousands or more of items, they would. Keeping track of sources and queries is time consuming. Fatigue and cognitive strain are factors. Analysts need an information retrieval (IR) system that will increase their productivity in a ‘triage’ workflow without removing information on which human judgments can be accurately and quickly made.
Analysts report the need for an integrated human-information interaction (HII) environment: “The structured interviews provided observations of how analysts work, and think about their work, and attempt to cover the whole analytical process . . . . Analyst work is not sequential, and moves back and forth, from one stage to another, across multiple tasks at a moment's notice. There is a need for an integrated approach for supporting analysts.” [Wright & Kapler, 2004].
In order to be successful such an integrated environment ideally ought to be a fluid and flexible medium of analysis and expression. It should seek to provide a common visual vocabulary for analytic work, creating a mixed-initiative environment for the whole analysis workflow and a workspace ready for collaboration. Primarily, it is the cognitive space where the analyst will see, and interact with, more information, more quickly, with more comprehension. Analysts also need a system that can easily integrate new/different IR technologies. There is an opportunity for a test bench approach. Not every method performs the same in the context of all tasks. Analysts need a way to determine which tools and methods are most effective for the task at hand. Finally, information seeking is only one part of the full work process, and must be connected with sense-making.
A number of systems have been proposed in the past that use themes developed further in TRIST. However none of these systems combine all of the functionality of TRIST into a coherent, integrated single display environment, and do so with the number of documents, number of characterizing dimensions and range of interactive, easily-accessed functionality.
DLITE [Cousins, 1997], is an early example of a graphical query system that uses iconic representations of queries and results. The system supports reusable queries, different workspaces for different tasks, and incorporates multiple search engines. DLITE does not, however, integrate the scanning and selection of information from the search results nor take advantage of auxiliary data and/or characteristics associated with the returned results.
Sparkler [Havre, 2001], now called Surmise, shows identical results across queries and provides comparison of results from multiple queries or multiple search engines. However, the system only connects identical documents and provides no way beyond the comparison for quickly characterizing and evaluating the documents or result sets.
The Envision [Nowell, 1993-1997], and similar Search Result Explorer [Andrews, 1999] systems group search results by displaying documents in a 2-D grid according to their metadata. There are a number of limitations to these implementations such as the per cell space limitations for displaying large numbers of documents, and the problem of how to represent document membership in multiple categories. These systems do, however, encode document meta-data in their iconic representations.
A number of systems have been developed for representing relevance of documents, and improving document scanning, by indicating or summarizing the location of query terms. TitleBars [Hearst, 1995], represents documents as bars that show relative locations and densities of query terms allowing the user to visually assess the quality of the match. Stacking bars for a single document from multiple queries allow the user to compare the documents match to the queries and so estimate the contents of the document. SeeSoft [Fick, 1994] displays documents as columns painting colour-coded lines for term matches. These systems are designed to work with a one-dimensional list of ten to twenty documents.
Rainbows [Hetzler], is a visualization of relationships between entities, for example documents. Entities are placed on the plane. Colour-coded arcs above or below the plane as well as proximity, indicate different types of relationships. Rainbows does not, however, offer much ability of simultaneously expressing meta-data regarding those entities, nor does it scale beyond relationships among ten to twenty documents.
Finally, PathFinder offers a broad range of functionality through a suite of tools with many separate, single purpose displays but does not offer an integrated view. Pathfinder operates on large numbers of documents but the displays aggregate (e.g. count of totals) occurrences. Also PathFinder is not a visualization tool, its emphasis is not on taping the analysts perceptual abilities to aid discovery tasks.