This invention relates generally to providing hierarchical clusters of data and, more particularly, to network-based methods and systems for constructing a taxonomy of product issues based on hierarchical clustering.
Organizations and businesses can receive a large number of messages from customers, potential customers, users and/or other people. For example, a business and/or organization can receive messages from its customers and potential customers, such as email messages, messages from online forums, e.g., support forums or message boards, and other types of messages. These messages can be related to a variety of different topics or issues. For example, the messages can be related to problems experienced by a user and can include a request for assistance to solve the problem. Oftentimes, these request messages are directed to a support center at the organization/business.
In addition, the Internet provides these organizations and businesses with access to a wide variety of resources, including web pages for particular topics, reviews of products and/or services, news articles, editorials and blogs. The authors of these resources can express their opinions and/or views related to a myriad of topics such a product and/or service, politics, political candidates, fashion, design, etc. For example, an author can create a blog entry supporting a political candidate and express their praise in the candidate's position regarding fiscal matters or social issues. As another example, authors can create a restaurant review on a blog or on an online review website and provide their opinions of the restaurant using a numerical rating (e.g., three out of five stars), a letter grade (e.g., A+) and/or a description of their dining experience to indicate their satisfaction with the restaurant.
Such a large volume of documents (i.e., different types of electronic documents including text files, e-mails, images, metadata files, audio files, video files, presentations, etc.) can be very difficult for organizations and/or businesses to manage. Entities may try to use clustering techniques to manage such a large volume of documents. Various algorithms can be used on a corpus of documents to produce different clusters of documents such that the documents within a given cluster share a common characteristic. These known clustering algorithms can be very time consuming to execute, and oftentimes provide poor results such as clusters having many unrelated documents.
Accordingly, it would be desirable to provide a computer system for organizing large volumes of electronic documents within hierarchical clusters wherein the documents within each cluster relate to a particular topic, and for determining a label for each particular topic. It would also be desirable to provide a computer system configured to determine a label for each cluster containing documents directed to a particular topic.