Traditional information retrieval systems process files of information objects, such as records, stored in a database and requests for information from the files. The systems identify and retrieve from the files certain records in response to the information requests. The retrieval of particular records depends on the similarity between the records stored in the database and requests presented to the system by a user. The similarity is measured by comparing values of certain attributes attached to the records and information requests.
Full-text information retrieval systems are used to store information objects containing textual matter, e.g., articles from magazines, newspapers or other periodicals.
To facilitate the retrieval process, articles in a full-text information retrieval system are "indexed" so the articles in the associated database are characterized by assigning descriptors to identify the content of the articles. In full-text retrieval systems, the descriptors can be the actual words that appear in the articles. The process of characterizing the articles, referred to as "indexing," can lead the retrieval system to particular items in the associated database in response to specific requests or "queries" from a user.
Typically, full-text information retrieval systems do not utilize a lexicon which is a dictionary of words or phrases that are maintained in a database. A lexicon can be used to obtain more precise indexing by acting as an intermediary between a user query and the associated database. A user query is processed against the lexicon to obtain a better indication of what the user is attempting to retrieve. Those systems that do use a lexicon to provide additional capabilities to the retrieval system index the articles on a canonicalized representation of the word, such as its citation form, i.e., the word as it would appear in the dictionary.
The lexicons used in prior information retrieval systems have been "static" in that they cannot be modified after article loading begins. The reason for this is that articles are indexed according to the lexicon. Thus, the index will become inaccurate as the lexicon changes. Therefore, an information retrieval system with a static lexicon must rely upon a full database reload to ensure article index consistency with lexicon changes.
The use of a static lexicon is acceptable for retrieval systems having a static database. However, the use of a static lexicon presents a problem in a retrieval system in which new articles are constantly being added, in particular, in areas such as computers, where new words and phrases are always being developed. A user query which incorporates words or phrases not in the lexicon, issued to a retrieval system with a static lexicon will not locate articles containing those words or phrases.
The foregoing problems of prior art full-text information retrieval systems manifest the need for improvement. Specifically, there is a need for a full-text information retrieval system that is capable of supporting a dynamic lexicon while providing for the reindexing of articles as the lexicon changes. Furthermore, reindexing of the articles must not prevent users from accessing the associated database and lexicon concurrently.