Analysis of large volumes of textual information has been greatly enhanced through the application of data visualization methods. Some types of visualizations are geared toward identifying similarity among documents within a data set and some types are focused on revealing the major concepts contained in the documents.
Chalmers, Using a landscape metaphor to represent a corpus of documents, In Spatial Information Theory, Frank and Campari, eds., Springer-Verlag, pp. 377–390, 1993, introduced a landscape metaphor for representing the content of a corpus of text documents. This was then extended and refined by Wise et al. (Wise et al., Visualizing the Non-Visual: Spatial analysis and interaction with information from text documents, Proc. IEEE Visualization 95, N. Gershon and S Eick, eds., IEEE Computer Society Press, Los Alamitos, Calif., pp. 51–58, 1995; Wise, The Ecological Approach to Text Visualization, J American Society for Information Science 50:1224–1233, 1999).
In the Wise et al. approach, an aggregate theme algorithm is applied to construct a three-dimensional representation over a framework defined by a two-dimensional representation of the information space (a Galaxies view). The surface plot is built in a grid by adding together the contributions of each thematic term to the documents in the grid region using a common term frequency metric (Salton, Developments in automatic text retrieval. Science 253:974–980, 1991). The map is then smoothed to provide the terrain representation. In this approach, the peak height displayed on the terrain represents a combination of document density and thematic content.
Another landscape type view has been used for text documents by Irwin et al. (Navigating Nuclear Science: Enhancing Analysis through Visualization, Sandia Report SAND97-2218, 1997). In this approach, the landscape view is simply redundant encoding of the document density overlaid on a two-dimensional proximity map. All thematic content or concepts are derived from the mathematics underlying the calculation of similarity measures and their application to deriving the proximity map.
The landscape view of the themes or content of a document set is distinct from other types of visualizations that provide visual overviews of the relation of one document to another. These methods include self-organizing maps (Kohonen, Self-organization and associative memory, 3rd edition, Berlin, Springer-Verlag), hierarchical taxonomy-based visualizations (U.S. Pat. No. 5,625,767 to Bartell and Clarke), geometric space representations (U.S. Pat. No. 5,930,784 to Hendrickson; U.S. Pat. No. 5,987,470 to Meyers et al.; U.S. Pat. No. 5,794,178 to Caid and Carleton). However, these alternative types of visualizations can serve as the two-dimensional framework on which the landscape visualization can be built.
The terminology information landscape has also been applied to methods for three-dimensional displays of graphical objects (U.S. Pat. No. 5,528,735 to Strasnick and Tesler; U.S. Pat. No. 5,555,354 to Strasnick and Tesler; U.S. Pat. No. 5,671,381 to Strasnick and Tesler). However, this type of landscape is distinct from the direct use of a contour map landscape representation.
Concept-based maps of information have also been described in U.S. Pat. No. 5,506,937 to Ford et al. These maps show the hierarchy in information concepts using a tree type visualization. Tree-type visualizations, such as cone tree view, have additionally been described in U.S. Pat. No. 6,088,032 to Mackinlay. These visualization approaches do not use a landscape metaphor.
While prior landscape visualization methods and systems have provided useful representations of data sets to enable the relationships between documents or data sets to be determined, the use of methods or systems can be improved by the addition of tools that enable the data to be presented based on user customizations and that enable data underlying the view to be more readily viewed or explored.