The rise of the Internet has occasioned two disparate phenomena: the increase in the presence of social networks, with their corresponding member profiles visible to large numbers of people, and the increase in the use of these social networks to perform searches for people and companies. It is common for various attributes of member (e.g., person or company) profiles to be standardized based on entities in various taxonomies. For example, an industry may be listed for a company, with the industry being selected from among a number of entries in an industry taxonomy, namely a data structure maintained by the social networking service. This industry taxonomy may include a hierarchical organization of possible industries. For example, an industry category of “Information Technology” in the industry taxonomy may have sub-categories of “Computer Software”, “Computer Hardware”, and “Computer Networking”. The industry taxonomy may organize the sub-categories as children of a parent node corresponding to “Information Technology.” There may be many layers of categories and subcategories in the industry taxonomy.
Industry, of course, is only one example of a member attribute that can be assigned to an entity in a taxonomy. Other examples include job title, school, skills, and so on.
While pure standardization of attributes into taxonomy entities works well when attributes are being considered in a vacuum, this type of organizational technique creates a technical challenge when it comes to relationships between the attributes. Taxonomies are typically created and managed by people who make subjective decisions as to where entities reside in the taxonomy. While the hierarchical nature of taxonomies do allow for some relationships to be established between entities, this relationship is static and standardized. For example, in a location taxonomy, “Los Angeles”, “Silicon Valley”, and “Seattle” may be considered to be part of a “West Coast” category, such that a “Los Angeles” node, “Silicon Valley” node, and “Seattle” node are all child nodes of a “West Coast” node. However, in this case, each of the child nodes are equidistant from each other, thus failing to capture the relationships between the child nodes. These relationships can be dynamic and may vary depending upon a desired analysis. For example, in some analyses, the physical distances between the locations might be relevant, whereas in some analyses, other, more subtle relationships between the cities may be relevant. For example, Los Angeles is closer in distance to Silicon Valley than Seattle is, yet if one is attempting to predict a likely city that a computer programmer living in Silicon Valley would move to, Seattle may be far more likely (and thus may have a much closer relationship for this analysis) than Los Angeles. The bare taxonomy architecture does not have a mechanism to capture these types of dynamic and subtle relationships among entities.
Indeed, standardized data in taxonomies is often categorical and co-occur sparsely, which limits their usefulness for predictive tasks.