Computer platforms provide many tools for storing and processing large and varying types of data sets. These can include word processing tools, data presentation tools, computer-aided graphics tools, electronic mail handling tools, calendar and scheduling tools, and numerous database manipulation tools. Given the various usages for data on the platform, applications have developed over time that are somewhat content centric. In other words, when data has been stored in the computer's database, the data is subsequently retrieved and/or manipulated in some manner based on the actual content of the stored data. In one specific example, an e-mail inbox can be searched for previously received e-mails based on a keyword that links a search tool to respective e-mails that are associated with the term, where the term is linked to the actual contents of stored e-mails. Thus, if a user were to search for the keyword “John,” any e-mail associated with this keyword would be retrieved and presented to the user, whereby the user would subsequently sift through the retrieved list for the desired e-mail associated with the term “John.” Although the specific e-mail the user is searching for may be retrieved in the resulting list of mail, a large number of e-mails may have to be subsequently searched in order to find the desired e-mail (e.g., thirty e-mails contain the term John). As can be appreciated, e-mail processing described in the above example can be extended to include many types of data processing and file manipulation activities. For instance these can include indexing of stored data, presentation of stored data, searching for various types of stored data, ranking data, indexing data, and so forth.
Relating to content-centric applications in general, one common view of a “finished” document that is to be retrieved, viewed, and employed by a reader is generally not sufficient to adequately support knowledge-intensive tasks. Thus, users or groups of users should also be able to add their own information to a knowledge source. In one example, a historian may want to add a detailed analysis to a chapter of a book. Another user may want to annotate a section of the book with experiences gathered from the analysis.
While practically all documents are available on or through the Web, its hypertext capabilities are currently not used as extensively to directly modify and annotate existing information (e.g., books, papers, web pages, and so forth). Rather, when content is deemed “completed” it is stored in some type of archive (e.g., a digital library), from which it is eventually retrieved as a monolithic entity, used for the production of yet more content. Moreover, the task of information retrieval is typically not integrated with the task of content development. Thus, the user has to retrieve documents they believe are required for a task and then base content development on the information found. While a new document search can always be initiated manually, it is a much more compelling view that content development and retrieval should be integrated. A system that continually scans and analyzes new text entered by a user should be able to search additional relevant information and present this to the user, who may then inspect the new data, integrate it, add cross-references, or reject the proposed sources, for example.
Another aspect is that knowledge from a source generally cannot be applied without a description of the context of both a document's creator and its reader. Only an explicit representation of the two context frames allows for a (semi-automatic) translation between them; in the above examples, old knowledge can be adapted to modem standards and vocabulary, but similar problems may increasingly appear in the medium and long-term future, when all documents that are currently created and stored in digital form become “historic knowledge” themselves.
Currently, users obtain documents through some type of indexing and ranking systems: web search engines for plain web pages, or some type of information retrieval systems for digital libraries (historically, these systems come from different roots, but modernm implementations exhibit some overlap between these techniques). In either case, the systems usually return complete documents, be it web pages, papers, or whole books. This is one of the primary reasons behind the feeling of “information overload” shared by many users with a virtually endless source of information to process.