The present invention relates to content management in general and more particularly to methods and apparatus for associating base content with relevant additional content to be presented with the base content.
In a typical content-management system, a user makes a request for base content and receives the base content with additional content that may or may not be relevant to the base content or to the user. The user can be a human user interacting with a user interface of a computer that processes the request for base content and/or forwards the requests to other computer systems. The user could also be another computer process or system that generates the request for base content programmatically. In the latter instance, it is likely that the requesting computer user will also programmatically process the results of the request for base content, but it might instead be the case that a computer user makes a request and a human user is the ultimate recipient of the response.
Base content might include a variety of content provided to a user and presented, for example, on a published web page. For example, base content might include published information, such as articles, about politics, business, sports, movies, weather, finance, health, consumer goods etc. Relevant additional content might include content that is relevant to the base content, a user, a system operator, a content provider, etc. For example, relevant additional content that is relevant to an article about consumer goods might include advertisements for sellers of the consumer goods.
Content-management systems are in common use and many are networked. One common network in use today is referred to as the Internet, a global internetwork of networks, wherein content-management system nodes might use the network to send requests to content-management system nodes elsewhere that might respond with the base content and the additional content. One protocol usable for content-management systems is the Hypertext Transport Protocol (HTTP), wherein an HTTP client, such as a browser, might make a request for base content referenced by a Uniform Resource Locator (URL) and an HTTP server might respond to the requests by sending content specified by the URL. Of course, while this is a very common example, content retrieval is not so limited.
For example, networks other than the Internet might be used, such as a token ring, a WAP (wireless application protocol) network, an overlay network, a point-to-point network, proprietary networks, etc. Protocols other than HTTP might be used to request and transport content, such as SMTP (Simple Mail Transfer Protocol), FTP (File Transfer Protocol), etc., and content might be specified by other than URLs. Portions of the present invention are described with reference to the Internet, a global internetwork of networks in common usage today for a variety of applications, but it should be understood that references to the Internet can be substituted with references to variations of the basic concept of the Internet (e.g., intranets, virtual private networks, enclosed TCP/IP networks, etc.) as well as other forms of networks. It should also be understood that the present invention might operate entirely within one computer or one collection of computers, thus obviating the need for a network.
As briefly described above, the requested base content itself could be in one or more of many forms. For example, some base content might be text, images, video, audio, animation, program code, data structures, formatted text, etc. A user might request base content that is a page having a news story (text) and an accompanying image. The base content may be formatted according to the Hypertext Markup Language (HTML), the Extensible Markup Language (XML), Standard Generalized Markup Language (SGML) or other language in use at the time.
HTML is a common format used for pages or other content that is supplied from an HTTP server. HTML-formatted content might include links to other HTML content and a collection of content that references other content might be thought of as a document web, hence the name “World Wide Web” or “WWW” given to one example of a collection of HTML-formatted content. As that is a well-known construct, it is used in many examples herein, but it should be understood that unless otherwise specified, the concepts described by these examples are not limited to the WWW, HTML, HTTP, the Internet, etc.
A supplier of base content might determine the subject of the base content and/or a user's interests, and provide additional content that is relevant to the base content and or the user's interests. In determining relevant content, the base content provider may maximize a profit, for example, by supplying advertisements that the user may have an interest in, and collecting fees from the advertiser for displaying the advertiser's ads. It is a continuing problem to correctly determine relevant content that is relevant to base content, users, system operators, content providers, etc. Relevant content as referred to herein, might include content that is relevant to base content, users, system operators, content providers, etc.
One approach to providing base content and additional content that is relevant to the base content is to manually create predefined associations between the content and the relevant content, possibly resulting in HTML links in the base content to the additional content. Typically, predetermined associations are manually generated by a person who reads through the base content and additional content to determine relevant associations. Such approach is generally labor intensive and static in nature. For example, a page containing base content H1 would always be presented with its associated relevant content G1. This approach might work well with systems having a small amount of content, but is typically unworkable at larger scales, such as news feeds, wherein the base content could comprise thousands of new news reports per hour.
Another approach to associating base content with additional content that is relevant to the base content is the taxonomy-taxonomy approach, wherein all, or most all, of the base content is assigned a node in a content taxonomy. The additional relevant content is also assigned nodes in a corresponding context taxonomy or the same content taxonomy. Then, when base content is to be presented, the server reads the taxonomy node ID of the base content and then retrieves additional content that has a matching taxonomy node ID or IDs. This approach might work well when base content and additional content are well definable, but this approach does not scale well for large bodies of base content and additional content without much effort.
What is needed is an improved content-management system for base content that automatically associates relevant content with the base content for presentation to a user, such as a human user or a computer user.