Every day millions of electronic documents are created, edited, communicated, and stored. These electronic documents may range in complexity and format from simple text documents, web pages, and news articles, to complex and lengthy scholarly papers, technical literature, and electronic presentations. Most of these electronic documents are compiled in various electronic document repositories or databases. As will be appreciated, with the multitude of existing electronic documents, and with the constant creation of countless new and edited electronic documents, it is exceedingly difficult for a user to locate and access only those electronic documents that are relevant to the user's interests. As such, various mechanisms or systems have been devised to recognize, select, and deliver to a user, electronic documents that the user may find relevant.
One common system that is used for document selection and delivery is referred to as a text or document filtering system. In a document filtering system, each document coming into the system (“new document”) is compared to a user profile that specifies an area or areas of interest of a user. If the new document compares favorably with the user profile, notice of the new document, or the document itself, is sent to the user. In this way, only those new documents that the user is likely to find relevant are delivered to the user.
The manner in which document filtering systems compare and match new documents and user profiles may vary. However, in a typical document filtering system, a new document is first parsed into a number of document terms. Each of these document terms is then assigned a weight based on information derived from the new document and information related to documents stored in a document database maintained or accessed by the document filtering system (the “document database”). These document terms and weights are then compared to profile terms and profile term weights contained in, or derived from, user profiles. In a typical system, the profile term weights indicate the relative importance of the terms in the profile in indicating the area or areas of interest of the user. Based on the comparison of the document terms and weights and the profile terms and weights, a document score is calculated that indicates how well the document terms match the terms of a user profile. If the calculated document score meets or exceeds a predetermined value associated with the user profile, the new document is then sent to the user (“sent document”).
One variation of the typical document filtering system is what is commonly referred to as an adaptive document filtering system. In an adaptive document filtering system, a user profile may be changed or adapted automatically based on feedback from the user concerning previously received documents. For example, the user may provide feedback indicating that the user found a document to be particularly relevant. The adaptive document filtering system then uses that feedback, in conjunction with data related to documents stored in the document database, to change or update the user profile in some manner that will improve the adaptive document filtering system's ability to select and deliver relevant documents to the user.
As will be appreciated, the accuracy or effectiveness of an adaptive document filtering system is directly related to the accuracy and/or timeliness of the data used by the system in the profile updating process. As mentioned, adaptive filtering systems typically use information related to documents stored in the document database in the process of updating user profiles. In a typical adaptive document filtering system, the data related to documents stored in the document database is obtained from a document index structure. The document index structure typically provides a term based index into documents stored in the document database. Unfortunately, the process of updating the document index structure occurs only infrequently at various predetermined times. For example, an adaptive document filtering system may only update the document index structure every week or two. Since the profile updating process relies on data obtained from the document index structure, the process of updating individual user profiles is typically carried out in batch processes following the updating of the document index structure. As such, the user profiles in typical adaptive document filtering systems are often out-of-date.
The primary reason the updating of the document index structure, and thus the profile updating process, occurs so infrequently is due to the time and computational resources involved in the document index structure updating process. In a typical adaptive document filtering system, the document index structure is stored in a mass storage device, such as a disk drive(s), due to its large size. As will be appreciated, mass storage devices typically have relatively slow data access and transfer times compared to faster memory devices, such as system main memory or RAM. Due to these access time constraints, it is simply impractical to update the document index every time a new document is received by the system. Furthermore, due to the large size of the document index structure, it is likewise impractical to store the document index in relatively fast main memory, where it may be accessed more quickly.
One drawback associated with the infrequent updating of user profiles is that one or more documents that are deemed relevant by a user may not be accounted for in a user profile for some time. For example, a new document may be delivered to a user that includes terms that are relevant to the user, but which are not contained in the user's profile (“new terms”). This may occur, for example, when the new document includes terms that have not been previously seen by the user, or when a term has just recently become relevant to the user. In the case where the user provides feedback related to a document including new terms just after the profile updating process has occurred, the new terms will not be reflected or accounted for in the user's profile for some time. As such, documents including new terms that are received by the filtering system before the next profile updating process occurs may not be selected for delivery to the user.