The present exemplary embodiments relate generally to the management of knowledge bases. They find particular application in conjunction with the removal of expired and/or duplicate items within knowledge bases, and will be described with particular reference thereto. However, it is to be appreciated that the present exemplary embodiments are also amenable to other like applications.
Communities often construct shared knowledge bases pertaining to one or more broad topics, where members of the communities contribute items to the knowledge bases. An item corresponds to knowledge on a specific issue and/or topic relevant to any one of one or more broad topics covered by a knowledge base. By contributing items to a knowledge base, other members of the corresponding community are able to search out and utilize the collective knowledge of the community. One example of a knowledge base is the EUREKA system from XEROX, which contains a searchable database of repair tips pertaining to copiers.
Shared knowledge bases, such as the EUREKA system, improve efficiency of associated communities by saving members' time and resources when diagnosing and/or solving problems. However, knowledge bases require the constant oversight of curators, which review items within a knowledge base to validate, edit, and combine similar items. Without oversight from curators, the usefulness of knowledge bases suffer over time as the knowledge bases become cluttered with duplicate and/or expired items; members must sift through many items to find the most relevant and useful items.
While curators try to find and remove duplicate and/or expired items, many still remain. It may be that a community lacks sufficient resources (e.g., curators) to properly monitor the items within its knowledge base or that the curators simply missed the duplicate and/or expired items. Naturally, curators whether human or machine are prone to make mistakes.
To address these problems, systems have been developed to help curators seek out and remove duplicate and/or expired items. Such systems generally work by modeling the search algorithm employed by a knowledge base by reviewing items within a knowledge base to determine those items having similar terms as other items within the knowledge base; for example, using term frequency with an inverse document frequency factor.
A problem with these systems, however, is that they fail to account for the particular search algorithm used by a knowledge base. Put another way, they fail to consider how members of a community interact with the knowledge base. Thus, as the search algorithm employed by a knowledge base changes, the set of duplicate and/or expired items encountered by community members may change, but known systems continue detecting duplicates in the same manner.
To illustrate, it may be that two items within a knowledge base are duplicates of one another, but use different vocabularies. Under the systems noted above, the two items would not be considered duplicates since they share few terms in common. However, the search algorithm employed by the knowledge base might include a synonym database equating the different vocabularies of the two items, whereby said items would generally co-occur in search results.
In view of the deficiencies noted above, there exists a need for an improved system of detecting expired and/or duplicate items within a knowledge base. The present application contemplates new and improved systems and/or methods which may be employed to mitigate the above-referenced problems and others.