The development of information systems, such as the Internet, and various online services for accessing the information systems, has led to the availability of increasing amounts of information. As computers become more powerful and versatile, users are increasingly employing their computers for a broad variety of tasks. Accompanying the increasing use and versatility of computers is a growing desire on the part of users to rely on their computing devices to perform their daily activities. For example, anyone with access to a suitable Internet connection may go “online” and navigate to the information pages (i.e., the web pages) to gather information that is relevant to the user's current activity.
Many search engine services, such as Google and Yahoo!, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request (i.e., a query) that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling” the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service may generate a relevance score to indicate how relevant the information of the web page may be to the search request based on the closeness of each match, web page importance or popularity (e.g., Google's PageRank), and so on. The search engine service then displays to the user links to those web pages in an order that is based on a ranking determined by their relevance.
Unfortunately, users of the information systems may encounter an information overload problem. For example, the search engine services often provide users a large number of search results, thus forcing the users to sift through a long list of web pages in order to find the relevant web pages.
Clustering techniques have been used to help organize objects that are similar or in some way related. These objects can include people, documents, web sites, events, news stories, and so on. For example, if the web pages of a search result are clustered based on similarity to one another, then the user can be presented with a list of the clusters, rather than a list of individual documents. As a result, the user will be presented with clusters of documents covering diverse topics on the first web page of the search result, rather a listing of individual documents that may all be very similar. Because of the large numbers of web-based objects (e.g., web pages, blocks of web pages, images of web pages, and web sites), it can be very computationally expensive to cluster such large numbers of objects.
Spectral clustering techniques have proved effective at clustering objects. The uses of spectral clustering has, however, been mainly restricted to small-scale problems because of the high computational complexity of spectral clustering. Spectral clustering represents the objects to be clustered and the relationship between the objects as a graph. A graph may be represented as G=<V, E>, where V={1, 2, . . . , n} is the set of vertices and E={<i, j>|i, jεV} is the set of edges. The relationship between objects can be represented by a relationship or adjacency matrix W, where wij is set to one when there is a relationship from a source object i to a target object j. For example, the relationship matrix can represent a directed web graph in which the objects are web pages and the relationships represent links from a source web page to a target web page. As another example, the relationship matrix can represent an undirected document graph of a collection of documents in which the objects are documents and the relationships represent the similarity (e.g., cosine similarity) between the documents. The goal of spectral clustering is to identify clusters of related objects.
Spectral clustering can be described as partitioning a graph into two clusters and recursively applying the two-way partitioning to partition the graph into more clusters. The goal of spectral clustering is to partition the graph so that an objective function is minimized. One objective function may be to minimize the cut, that is, ensure that the relationships represented by the edges that are cut are minimized. Another objective function, referred to as “ratio cut,” balances cluster sizes, and another objective function, referred to as “normalized cut,” balances the cluster weights. The membership of the objects in the two clusters A and B can be represented by the following:
      q    i    =      {                            1                                                    if              ⁢                                                          ⁢              i                        ∈            A                                                            -            1                                                              if              ⁢                                                          ⁢              i                        ∈            B                              where qi represents the cluster that contains object i. If qi is 1, then the object is in cluster A; and if qi is −1, then the object is in cluster B . The objective function can be represented by the following:
                                 J          =                    ⁢          CutSize                                              =                    ⁢                                    1              4                        ⁢                                          ∑                                  i                  ,                  j                                            ⁢                                                                    w                    ij                                    ⁡                                      [                                                                  q                        i                                            -                                              q                        j                                                              ]                                                  2                                                                                  =                    ⁢                                    1              4                        ⁢                                          ∑                                  i                  ,                  j                                            ⁢                                                w                  ij                                ⁡                                  [                                                            q                      i                      2                                        -                                          q                      j                      2                                        -                                          2                      ⁢                                              q                        i                                            ⁢                                              q                        j                                                                              ]                                                                                                  =                    ⁢                                    1              2                        ⁢                                          ∑                                  i                  ,                  j                                            ⁢                                                                    q                    i                                    ⁡                                      [                                                                                            d                          i                                                ⁢                                                  δ                          ij                                                                    -                                              w                        ij                                                              ]                                                  ⁢                                  q                  j                                                                                                  =                    ⁢                                    1              2                        ⁢                                          q                T                            ⁡                              (                                  D                  -                  W                                )                                      ⁢            q                              where J represents the objective function to be minimized. If qi is represented as continuous values, rather than discrete values, then the solution to the object function is represented by the eigenvectors of the following:(D−W)q=λq where q represents an eigenvector and λ represents an eigenvalue. The Laplacian matrix of the graph is represented as follows:L=D−W Since the Laplacian is semi-positive definite, then the following holds for any value of x:xTLx≧0The first eigenvector and the first eigenvalue are represented as follows:q1=(1, . . . ,1)T=eT λ1=0The second eigenvector q2 represents the solution, and the smaller the second eigenvalue λ2 the more accurate the solution. Thus, q2 contains an element for each object and the object i is in cluster A if q1 is greater than zero, and in cluster B otherwise.
Traditionally, spectral clustering first performs an eigenvalue decomposition (“EVD”) and then some heuristics such as k-means are applied to the eigenvectors to obtain the discrete clusters. Unfortunately, eigenvalue decomposition is computationally expensive. For example, the Lanczos algorithm is O(mn2k) and the preconditioned conjugate gradient (“CG-based”) algorithm is O(n2k), where k is the number of the eigenvectors used, n is the number of data points, and m is the number of iteration steps. (See Sorensen, D. C., “Implicitly Restarted Arnoldi/Lanczos Methods for Large-Scale Eigenvalue Computations,” Technical Report, TR-96-40, 1996, and Knyazev, A. V., “Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method,” SIAM Journal on Scientific Computing, vol. 23, no. 2, pp. 517-541, 2001.)