The volume of electronic information in both personal and corporate data stores is increasing rapidly. Examples of such stores include e-mail messages, word-processed and text documents, contact management tools, and calendars. But the precision and usability of knowledge management and search technology has not kept pace. The vast majority of searches performed today are still keyword searches or fielded searches. A keyword search involves entering a list of words, which are likely to be contained within the body of the document for which the user is searching. A fielded search involves locating documents using lexical strings that have been deliberately placed within the document (usually at the top) with the purpose of facilitating document retrieval.
These data retrieval techniques suffer from two fundamental flaws. Firstly, they often result in either vast numbers of documents being returned, or, if too many keywords or attribute-value pairs are specified and the user specifies that they must all appear in the document, no documents being returned. Secondly, these techniques are able only to retrieve documents that individually meet the search criteria. If two or more related (but distinct) documents meet the search criteria only when considered as a combined unit, these documents will not be retrieved. Examples of this would include the case where the earlier draft of a document contains a keyword, but where this keyword is absent from the later document; or an e-mail message and an entry in an electronic calendar, where the calendar entry might clarify the context of a reference in the e-mail message. There is a clear need for a search technique that returns sets of related documents that are not merely grouped by textual similarity, but also grouped and sequenced according to the social context in which they were created, modified, or quoted.
This would make it possible to retrieve a very precise set of documents from a large corpus of data. Hitherto, with conventional search tools, this has only been possible by the use of complex search queries, and the results have been restricted to documents that individually meet the search criteria. It is desirable to be able to retrieve a precise set of documents from a large corpus of texts using relatively simple search queries. It would be of further benefit to present said documents in the context of causally related links (for example, a document containing the minutes of a board meeting has a causal link to an email announcing that meeting), even when those other documents do not, individually, satisfy the search criteria. This would relieve the user of the need for prior knowledge (before running the search) of such details as the exact date on which a message was sent, and who sent it. Existing search tools require such prior knowledge, because they do not establish causal links between documents.