1. Field
This disclosure relates generally to analyzing document collections.
2. Background
Analyzing large collections of documents to detect the presence of a term or expression can be time consuming. When such document collections are analyzed to detect the presence of any of multiple terms or expressions, the computing demand becomes substantially greater.
Exemplary large collections of documents can include, for example, corporate email archives, corporate document archives, system log files, electronic book collections, and the like. Applications may require repeated analysis of such archives for the presence of various terms or expressions. During the analysis of these archives, it may also be desired that statistics are collected related to the presence of the terms and expressions of interest in the archives.