In a content retrieval system, a user makes a request for content and receives content matching that request. The user can be a human user interacting with a user interface of a computer that processes the requests and/or forwards the requests to other computer systems. The user could also be another computer process or system that generates the request programmatically. In the latter instance, it is likely that the requesting computer user will also programmatically process the results of the request, but it might instead be the case that a computer user makes a request and a human user is the ultimate recipient of the response, or even the opposite, where a human user makes a request and a computer user is the ultimate recipient of the response.
Content retrieval systems are in common use. One common system in use today uses the network referred to as the Internet, a global internetwork of networks, wherein nodes of the network send requests to other nodes that might respond with content. One protocol usable for content requesting is the HyperText Transport Protocol (HTTP), wherein an HTTP client, such as a browser) makes a request for content referenced by a Uniform Resource Locator (URL) and an HTTP server responds to the requests by sending content specified by the URL. Of course, while this is a very common example, content retrieval is not so limited.
For example, networks other than the Internet might be used, such as token ring, WAP, overlay, point-to-point, proprietary networks, etc. Protocols other than HTTP might be used to request and transport content, such as SMTP, FTP, etc. and content might be specified by other than URLs. Portions of present invention are described with reference to the Internet, a global internetwork of networks in common usage today for a variety of applications, but it should be understood that references to the Internet can be substituted with references to variations of the basic concept of the Internet (e.g., intranets, virtual private networks, enclosed TCP/IP networks, etc.) as well as other forms of networks. It should also be understood that the present invention might operate entirely within one computer or one collection of computers, thus obviating the need for a network.
The content itself could be in many forms. For example, some content might be text, images, video, audio, animation, program code, data structures, formatted text, etc. For example, a user might request content that is a page having a news story (text) and an accompanying image, with links to other content (such as by formatting the content according to the HyperText Markup Language (HTML) in use at the time).
HTML is a common format used for pages or other content that is supplied from an HTTP server. HTML-formatted content might include links to other HTML content and a collection of content that references other content might be thought of as a document web, hence the name “World Wide Web” or “WWW” given to one example of a collection of HTML-formatted content. As that is a well-known construct, it is used in many examples herein, but it should be understood that unless otherwise specified, the concepts described by these examples are not limited to the WWW, HTML, HTTP, the Internet, etc.
A supplier of content might determine the interests of its users and provide relevant content, such as current news, sports, weather, search services, calendaring, messaging, information retrieval and the like. Content might be in the form of pages that are static (i.e., existing prior to a request for the page), dynamic (i.e., generated in response to a request) or partially static, partially dynamic. Thus, a news report about an event in a particular city might exist as a static page, but that same content might also be generated dynamically in response to a request, taking into account the context of the content and/or demographics of the user making the request.
As an example of a dynamically generated page, if the news report was being viewed by a user known to live in city in which the event is to occur, the resulting page might include information about how to drive to the location of the event or to purchase tickets, however if the user is known to live far from that city, the resulting page might include information about the weather in that remote city and how to purchase an airline ticket to that city.
In the above example, host content (the news report) and guest content (the weather, purchase links, directions, etc.) are associated such that a request for the host content returns a page (for HTTP systems, or other content unit for other types of systems) containing the host content and related guest content.
It is a continuing problem to correctly determine relevant guest content. If the city for which the news story was relevant was correctly determined, the user demographics correctly determined and the city of the guest content was correctly determined, the presentation works well. However, if the news story is not actually related to a particular city or event, then associated guest content will look out of place and confuse the user.
One approach to host content and guest content association is to create predefined associations between host content and guest content. In such systems, a page containing host content H1 would always be presented with its associated guest content G1 alongside. This approach might work well with systems having a small amount of host content, but is typically unworkable at larger scales, such as a news feed, where the host content could comprise thousands of new news reports per hour.
Another approach is the taxonomy—taxonomy approach, wherein all, or most all, of the host content is assigned a node in a content taxonomy. The guest content is also assigned nodes in a corresponding context taxonomy or the same content taxonomy. Then, when host content is to be presented, the server reads the taxonomy node ID of the host content and then retrieves guest content that has a matching taxonomy node ID or IDs. This might work well when host content and guest content are well definable, but this approach does not scale well for large bodies of host content and guest content without much effort.