A keyword or phrase is a word or set of terms submitted by a Web surfer to a search engine when searching for a related Web page/site on the World Wide Web (WWW). Search engines determine the relevancy of a Web site based on the keywords and keyword phrases that appear on the page/site. Since a significant percentage of Web site traffic results from use of search engines, Web site promoters know that proper keyword/phrase selection is vital to increasing site traffic to obtain desired site exposure. Techniques to identify keywords relevant to a Web site for search engine result optimization include, for example, evaluation by a human being of Web site content and purpose to identify relevant keyword(s). This evaluation may include the use of a keyword popularity tool. Such tools determine how many people submitted a particular keyword or phrase including the keyword to a search engine. Keywords relevant to the Web site and determined to be used more often in generating search queries are generally selected for search engine result optimization with respect to the Web site.
After identifying a set of keywords for search engine result optimization of the Web site, a promoter may desire to advance a Web site to a higher position in the search engine's results (as compared to displayed positions of other Web site search engine results). To this end, the promoter bids on the keyword(s) to indicate how much the promoter will pay each time a Web surfer clicks on the promoter's listings associated with the keyword(s). In other words, keyword bids are pay-per-click bids. The larger the amount of the keyword bid as compared to other bids for the same keyword, the higher (more prominently with respect to significance) the search engine will display the associated Web site in search results based on the keyword.
Conventional systems and techniques to identify bid term(s) relevant to Web site content typically use clustering algorithms to partition a set of objects into groups, or clusters in such a way that objects from the same cluster are similar and objects from different clusters are dissimilar. Such clustering approaches assume that data objects to be clustered are independent and of identical class, and are often modeled by a fixed-length vector of feature/attribute values. In the recent surge of data mining research, this classical problem has been re-examined in the context of large databases. However, homogeneity of data objects to be clustered seems still the basic assumption, even though some emerging applications, such as Web mining and collaborative filtering, propose challenges to such an assumption. In such applications, data objects are of different types and are highly interrelated. Unfortunately, even though objects distributed across heterogeneous object types may be highly interrelated, conventional clustering operations typically cluster respective object types individually and without consideration of any interrelated aspects of different object types.
One reason for this is because relationships between data objects of different type are often sparse and difficult to identify. Another reason is because representation of any such relationships with a static fixed-length value vector attached to respective objects, wherein the vector represents both object attributes and attributes of a related object of a different type, would create object attribute/feature vectors with a very high dimensionality (feature space). Such high dimensionality is not desirable because the data will be far apart from each other in the feature space, and efficient models cannot be sufficiently trained with such a sparse amount of data in small regions.
Accordingly, better clustering techniques to identify and group related objects (e.g., terms) in view of relationships across heterogeneous data objects would be useful. These clustering techniques could be used, for example, to provide systems and methods that identify term(s) for search engine optimization and term bidding, and thereby provide both with a substantially higher probability of identifying relevant term(s).