As the amount of electronically stored data (e.g., computer files such as documents) in data repositories continues to grow, so too does the importance of tools that allow people to find and access such data. Traditionally, two distinct types of tools have been utilized in content retrieval systems. One type of tool that has traditionally been employed in content retrieval systems is commonly referred to as a keyword search tool or search engine. The second type of tool often utilized in content retrieval systems is commonly known as a browser or browsing tool.
To perform a search using a keyword search engine, a user (e.g., a searcher) generally inputs one or more keywords that the search engine uses in a query executed against the content. Typically, the user is presented with the title of each document that contains one or all of the keywords input by the user. In a structured search engine, the user may be able to specify the particular part of the document to search, for example, the title or body of the document. However, because keyword searches often have a significantly large number of results, often the search engine only displays a list of those documents that are deemed most relevant, based on some predetermined relevance ranking scheme. For example, the relevance of documents may be ranked based on the number of keyword “hits” in a particular document.
Content retrieval systems that utilize keyword searches are problematic for a variety of reasons. One problem is that keyword searches often result in far too many results. For example, because documents are returned if a keyword is found anywhere in the document, a significant number of documents are returned to the user that have no relevance to the topic or area of interest to the user. Another problem with keyword searches is that in order to be effective the user must be familiar with the content being searched. In particular, the user must be familiar with the particular vocabulary of the content and have a relatively high level of proficiency with the language of the content. This is particularly problematic when the user has a native language that is different than the language of the content and/or when the content being searched is highly technical in nature and has a relatively limited vocabulary that is specific to the technical area to which the content relates. Additionally, because many search engines do not employ linguistic databases, the user is required to input keywords exactly as they appear in the document. For example, if a document contains a different form of a keyword that is input by a user, or if a keyword is misspelled as input by the user, a document “hit” will not result. Finally, if there is a significant amount of content and the content is not well structured to facilitate searching, the search may take a significant amount of time to perform.
Content repositories that offer a browsing capability typically allow the user to browse content that has been categorized or organized into a tree-like, hierarchical structure, similar to the directory structure on a typical personal computer. The user is generally presented with one or more top-level categories, from which the user selects the category that seems most relevant to the topic or area of interest to the user. As the tree is traversed from top to bottom, the categories typically increase in their level of specificity or detail.
While browser tools solve some of the aforementioned problems associated with search engine tools, browser tools are also problematic for a variety of reasons. One problem with browser tools is that the user is generally forced to take a linear path down a single branch of the tree that leads the user deeper and deeper into the hierarchical structure. If the user cannot find relevant content after traversing a particular path, the user is forced to traverse backwards, up the tree-like structure, often resulting in a frustrating and time-consuming search experience for the user.
Additionally, content repositories that utilize browser tools generally require significantly more work to setup and maintain because the content must be categorized or organized into the tree-like structure. In particular, content managers face the often-difficult challenge of determining where to place each document within the hierarchical structure. Because many documents contain content of interest to different people for different reasons, documents often must be placed within more than one category of the hierarchical structure. Consequently, when the document requires updating, the content manager must locate and update multiple copies of the document—an inefficient and potentially error-prone process.