There are a wide variety of different applications that can make use of a system that classifies different web pages in a web page collection. For instance, web pages can be classified according to subject matter. Spam detection applications can also be regarded as web page classification by classifying any given page as spam, or as a legitimate content page.
There are currently a number of different approaches to analyzing web pages in this way. A collection of web pages can be described by either hyperlinks between web pages or words occurring in web pages. In the former description, the collection of web pages can be represented as a directed graph, where each vertex (or node) in the graph represents a web page, and each directed edge in the graph represents a hyperlink between the web pages in the collection. In the latter description, each web page is represented as a vector in Euclidian space, and each element in the vector indicates the recurrence of some word; links can then, for example, encode the similarity between the corresponding documents, based on the two vectors.
To date, these two different types of web categorization have been viewed as separate systems. In fact, in the general machine learning problem setting, the data are often assumed to be represented in a single vector space or by a single graph. Yet, in many real life situations (such as in web page analysis) the same instances (the same web pages) may be represented in several different vector spaces, or by several different graphs, or even as a mixture of vector spaces and graphs.
One technology worth mentioning is referred to as multiview learning. One approach to multiview learning is to define a kernel for each type of data representation, and then convexly combine those kernels. The underlying principle of this methodology is unclear, however. For instance, it is known that a spectral clustering approach for a single graph is derived from a real-value relaxation of combinatorial normalized cut, which naturally leads to graph Laplacians. However, it has not yet been considered which combinatorial cut can lead to the convex combination of graph Laplacians in a scenario in which multiple graphs are used.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.