In today's world of electronic communication, nearly every organization (e.g., a corporation, a company, an enterprise (commercial or non-profit) and so on) has an Internet accessible Web site with which they can make their information available to the customer. A Web site has become such an integral part of an organization's resources that nearly all organizations are constantly trying to improve the customer's experience during interaction with their Web site. This desire to provide a pleasant experience is especially prevalent during a customer's search for specific information within that organization's Web site. Because there are many organizations that have literally millions of documents available to their customers, it is important for an organization to facilitate an efficient search, navigation, and information retrieval process for the customer.
For an example, when a customer has a software problem, the customer can either submit a case to have a support engineer find a solution for their problem or they can try to self-solve it by searching the collection with the hope of finding a solution document for the problem. If, according to the customer's experience, self-solving can be done efficiently, it is likely that they will return to the site to self-solve problems when they arise. However, if the customer finds it difficult to find the right solution document, there is a high probability that the customer will exit the self-solve area, open a case with a company's customer service department, and in all likelihood, refrain from further attempts at self-solving their problems.
In another example, if a customer comes to an organization's Web site and spends an inordinate amount of time trying to find their particular information, the customer can become frustrated at the lack of success with their search. This can create an unsatisfied customer who may opt to utilize a competitor's Web site, and perhaps their product. If, on the other hand, the navigation within the organization's Web site is simple and a customer can easily search for and retrieve the desired information, that customer is more likely to remain a loyal customer.
Accordingly, it is important for an organization to identify the customer's most important information needs, referred to as the hot information needs. It is also important to provide, to the customer, those documents that contain that information, and to organize the information into hot topics.
One way to facilitate the information search and retrieval process is to identify the most FADs (frequently accessed documents) and to provide a straightforward means to access the documents, by, for example, listing the FADs in a hot documents menu option within a primary web page (e.g., home page) of an organization's Web site. The underlying assumption is that a large number of users will open these documents listed in the hot documents menu option, thus making the typical search and retrieval process unnecessary. However, it is common for the list of FADs to be quite large, up to hundreds or thousands of documents. It has been determined that some customers would prefer to have the FADs grouped into categories or topics. When there is no predefined categorization into which the FADs can be placed, the needs arises to discover categories or topics into which those FADs can be placed.
Categorizing documents into a topic hierarchy has conventionally been accomplished manually or automatically according to the contents of the documents. It is appreciated that the manual approach has been the most successful. For example, in Yahoo, the topics are obtained manually. However, even though the documents in Yahoo are correctly categorized, and the categories (topics) are natural and intuitive, the manual effort to determine the categories is monumental and is conventionally accomplished by domain experts. In response to the monumental task of manual categorization of documents, many efforts have been dedicated to create methods that can accomplish document categorization automatically. It is appreciated that of the automatic categorization methods created, very few, if any, have had results comparable to manual categorization, e.g., the resulting categories are not always intuitive or natural. It is further appreciated that the topic hierarchy described above is predicated upon document content categorization methods which commonly produce results, e.g., topics, that are quite different from the customer's information needs and perspective.
Because organizations need to be cognizant of their customer's interests to better serve them, knowing which topics and corresponding documents are of most interest to their customers, organizations can thus organize their web sites to better serve their clientele. Discovering hot topics according to the user's perspective can be useful when the quality of the hot topics are high (meaning that the documents in a hot topic are really related to that topic), users can rely on them to satisfy their information needs. However, due to user browsing tendencies, often driven by curiosity or ignorance, the clicking may be noisy, (e.g., going to and/or opening unnecessary/uninformative/unrelated sites and/or documents) which can lead to hot topics contaminated with extraneous documents that do not belong in the categories in which they are disposed.
Therefore conventional means of presenting relative information, e.g., hot topics, FAD's, and the like, has disadvantages because of the vast number of documents that are available as well as the fact that those documents may not be placed in categories that match a user's perspective. Furthermore, the prior art is limited in the manner in which the documents are categorized, conventionally requiring numerous experts to commit a great deal of time to attempt to properly categorize the documents. In addition, the prior art suffers from an inability to filter out the extraneous documents from those documents that contain the desired information related to the customers needs.