This invention relates to accessing information and categorizing users and more particularly to an adaptive and scalable indexing scheme.
Document retrieval often involves accessing a large information space. This information space is characterized by many dimensions. Each document occupies a single point in this information space. However, the organization of documents in the space is complex. This complexity is a product of the dimensionality of the space. Documents share properties, and thus share the coordinates of some subset of dimensions, but differ with respect to other properties. Because of this, the entire information space is only sparsely populated with documents. Sparse distribution of documents in the information space makes intelligent searching of the space difficult. The relationships between two documents are only poorly described in the space since the documents typically differ in more ways than they are the same. Across a group of documents, there is minimal structure to organize a search for relevant documents.
Artificial neural networks (ANNs) are used to generate statistical relationships among the input and output elements, and do so through self-organization or, at least, through an automated abstraction or learning process. Several efforts have employed ANNs to a limited extent for information retrieval. The ANN contains a set of constraints which, when given some input pattern coding a query, directs the user to similar documents or pieces of information. The initial set of constraints is generally determined by the application of a training corpus set of records to the ANN. These constraints are incrementally modifiable, allowing the ANN to adapt to user feedback. However, although several research efforts have demonstrated the utility of adaptive information retrieval with ANNs, scalable implementations have not appeared. For reviews, see Doszkocs, 1990, and Chen, 1995, incorporated herein by reference.
On the other hand, some large-scale systems which lack mechanisms for adaptation have successfully exploited the statistical relationships among documents and terms found in those documents, for storage and retrieval of documents and other information items. For example, U.S. Pat. No. 5,619,709 to Caid, et. al., describes generation of context vectors that represent conceptual relationships among information items. The context vectors in Caid, et. al. are developed based on word proximity in a static training corpus. The context vectors do not adapt to user profile information, new information sources, or user feedback regarding the relevancy of documents retrieved by the system. Thus, the system in Caid, et. al. does not evolve over time to provide more relevant document retrieval.
Accordingly, a need remains for a scalable information representation and indexing scheme that adapts document retrieval to continuously changing user feedback, user profiles, and new sources of information.