Advances in electronic storage technology have resulted in the creation of vast databases of documents stored in electronic form. These databases can be accessed from remote locations around the world. As a result, vast amounts of information are available to a wide variety of individuals. Moreover, information is not only stored in electronic form but it is created and disseminated throughout the world in electronic form. Sources for the electronic creation of such information includes news, periodicals, as well as radio, telephony, television and Internet services. All of this information is made available to the world through a variety of computer networks, such as the worldwide web, on a real time basis. The problem with this proliferation of electronic information, however, is that it is difficult for any one individual to extract information useful to that individual in a timely manner—that is, how does an individual mine the databases and filter the streams of incoming data?
Conventional programs for finding, filtering, classifying or extracting information typically operate to perform only one operation. For example, a program may be structured such that a user identifies some logical combination of words and then the program identifies documents (or other information) in selected databases that contain a similar logical combination of words. Another program may operate to extract documents containing specific types of entities. However, the programs that perform entity extraction are generally incompatible with the programs that extract logical word combinations. Moreover, while both the entity and logical word extraction programs operate on databases, neither are structured to operate on data streams. The problems of these types of conventional programs arise from the fact that they are based on a computer program model in which a defined input is processed to produce a defined output. This conventional programming model assumes the user has a substantial familiarity with the individual program as well as the characteristics of the data being analyzed. Furthermore, because of the static nature of the conventional model (i.e., a single set of output data, such as a document list, is generated for each input request), users are left to determine for themselves the relationships among the data being analyzed and the input parameters on which the information analysis is based.
To overcome the problems inherent in conventional search, analysis and filtering techniques, a new type of information analysis technique must be developed in which the user may discover various characteristics of the information contained in a database or stream of data. This new technique, then, may allow a user to represent and interrelate in an arbitrary way the characteristics that have been discovered. Such a new information analysis tool must enable manipulation of the input parameters and multiple output datasets in such a way as to demonstrate how various database characteristics interrelate. This new information analysis tool must further demonstrate how individual pieces of information in an input data stream pass through various analysis stages and provide feedback for dynamic adjustment of the analysis stages.