1. Field of the Invention
Embodiments of the present invention relate to updating a search index and searching for documents.
2. Description of the Related Art
Document retrieval systems have become an essential tool for organizing and gathering information on computer systems. For example, companies may store emails, word documents, pdf files, etc. on a computer system so they may be retrieved at a later time. Because of the large number of documents that accumulate over time, a search engine may be used to locate documents. A search engine is a computer program that accepts search queries from a user and retrieves documents that meet the query conditions. For example, a user may wish to retrieve all company emails that contain the words “October shipment” in order to learn more about a particular shipment. Furthermore, metadata may be associated with documents to provide additional useful information. For example, a “link” field may be associated with each email to indicate whether the email contains any links to websites. Thus, the user may search for emails that contain the words “October shipment” and that also contain links to websites.
When the number of documents in an information retrieval system becomes large, the search engine may take a very long time to scan every word in a document corpus. Thus, it may take a long time to retrieve documents that match the user's query. However, search indexes may be used to allow fast and accurate information retrieval. Indexing refers to a process where text from documents is parsed and stored in a search index. The data may include both metadata and document text. For example, one entry in a search index may store the word, “shipment.” Another field may store a list of which documents contain the word, “shipment” (for example, “document 1, document 3, document 5”). Another field may store metadata, such as the date that a document was created or a storage location in a file system where the document may be found.
When a document's metadata or content changes, the document is typically reindexed. Reindexing involves updating a search index to reflect new content. Documents are often reindexed due to a change in metadata. For example, document metadata in a search index may provide a folder name indicating where the document is located. When the document is moved to a different folder, the document is reindexed so that the metadata provides the correct folder name. Therefore, search results remain accurate and up to date.