Traditional search systems use vector space models and inverted indices to index documents. In a traditional search system, a collection of documents and their contents are analyzed and inverted indices are generated. The inverted indices are data structures that are optimized for quick retrieval of documents best matching a search query. Documents matching a search query can be returned with associated scores indicating how well the documents match the query.
Before an inverted index can be used for searching a set of documents, the inverted index must be built. Building the inverted index takes far longer than executing a query using the index. Due to the length of time needed to build the index, indexing a set of documents is typically performed periodically, such as once a day, once every few days, or at longer intervals.
Because indexing documents can take a substantial amount of time (especially if the number of documents to index is large), it can be difficult to maintain an index that represents frequently changing activity. For example, if a set of documents is indexed once a day at midnight, then changes to the documents that occur during the day will not be represented in the index until the documents are indexed the following night. When performing a search using such an index, the results will only reflect the state of the documents when they were indexed (i.e., changes to the documents since they were indexed will not be reflected in the results).
Furthermore, traditional search systems that index search terms within documents are only able to capture information represented within the documents themselves (e.g., text and other content of the documents). Traditional search systems do not represent other activity that may be occurring with respect to the documents.
Therefore, there exists ample opportunity for improvement in technologies related to improving document indexing and search technologies.