The present invention relates to the field of information processing systems, and more particularly to search applications and means of indexing information to facilitate searches.
Information networks often store large amounts of data in the form of documents and other objects. For example, it is common in enterprise networks to store email, including attachments and associated data. Information is stored for later retrieval and reference for numerous purposes. Stored documents are often searched to find specific information, determine patterns, and so on. Given that the amount of stored data would make it impossible for a person to manually search through the data to find a desired document or reference, search engines have been developed. A search engine allows a user to provide terms and qualifiers as parameters of a search, and the search engine determines which documents match the provided search criteria. Search engines do not search through each document or object, and instead use an index of the documents. An index lists all searchable terms in the documents, and indicates which documents each term appears in, and indicates the term's position or positions in the document. An index indicating both of these parameters is referred to as an inverted index.
Documents have two types of information that users typically wish to search, which are the content of the document and the metadata associated with the document. The content is the information which is rendered for the user by, for example, an application. The metadata is data which describes or frames the content to provide some context. For example, in a typical email document there is a text content written by the email author to a recipient. The email address of the author and the recipient, the subject, and other data such as time sent, are all metadata that is associated with the document. The metadata is information maintained in specified fields of the document, and may be handled differently than the content. Depending on the document type, some metadata may not be displayed when the document content is rendered for a user in an application interface window.
Clearly, metadata is important. It is therefore desirable to be able to search the metadata when searching a body of documents. The metadata can be indexed along with the content to make metadata searchable. Special conventions can be used in the index to indicate a particular term appears in a metadata field, as well as which metadata field. A search engine can be provided with field definitions or characteristics against which to search. For example, a given field may be defined to be case sensitive, so that capital letters are distinguished from lower case letters. Once the index is generated, though, it becomes very difficult to change such definitions because the entire body of documents will have to be re-indexed, which can be expensive and time consuming. Therefore there is a need for an indexing system that allows portions of the indexed content to be re-indexed with different settings without requiring a re-indexing of all documents.