This invention relates generally to labeling documents within a set of documents, and more particularly, to network-based methods and system for labeling documents organized within a cluster.
Automated classification, or “labeling,” of data may be used to efficiently organize, route, and/or process large quantities of data. As an example, support centers receive large amounts of documents related to support requests. Document labeling techniques, such as clustering, may be used to group together similar documents.
Various algorithms can be used to organize documents by producing different clusters of documents such that the documents within a given cluster share a common characteristic. Documents can include different types of electronic files such as text files, e-mails, images, metadata files, audio files, and presentations. A cluster can be labeled based on a common characteristic shared by the documents organized into the cluster. A label can identify various types of information such as a subject or theme of a given cluster and therefore facilitate classification. In some cases, document clusters can be labeled by manual inspection where an operator retrieves samples from different clusters and labels the clusters based on information from the samples.
Unfortunately, manual inspection of the documents for labeling purposes can be very time consuming and expensive, especially, when organizing large quantities of documents. Accordingly, it is desirable to have a system and method for automatically labeling documents including within a cluster of documents.