This invention relates generally to managing website content, and more particularly to the integration of third party content for access by and optimization of search engines.
Many businesses and other entities rely upon their websites to attract and provide information to users. E-commerce businesses, for instance, market and sell to connected consumers primarily by using their websites. When connected consumers make purchase decisions, they are heavily influenced by online sources such as search results, reviews by prior purchasers, comments on social networks, etc. Many such businesses do not have the ability to integrate conveniently or seamlessly consumer reviews, comments, discussion threads, or other third party non-original content into their websites, and particularly in a way that makes the content accessible by user agents such as web browsers, search engine crawlers, bots, and the like. Known approaches that enable incorporation of non-original content into an existing website are problematic. They are generally complex, high cost, or otherwise do not afford the desired seamless integration and access. As such, they are unsuitable for many businesses. Thus, such businesses do not have an easy way of making consumer comments or other third party content accessible by prospective purchasers.
One known approach for incorporating third party content into a website is for the third party content provider to gain read/write access to the original content repository of the origin website, and to use an authoring protocol that allows modification of the original content. Special markers may be manually placed within the content files so that a third party content provider can recognize what portion of a page needs to be altered, and what content to place there. Since these markers need to persist across multiple edits, they are generally made with tags that have no visual rendering, e.g., HTML comments. There are several drawbacks of this approach. First, it requires a standard mechanism for accessing and modifying original content stored in the content repository. Although standardized protocols suitable for this purpose exist, they are not widely used or widely available. Secondly, this approach requires an understanding of how the original content is laid out, e.g., headers, footers, layouts, pages, sidebars, etc., which varies from one service provider to another with no standardization. Thirdly, as with any distributed authoring system, editing conflicts are quite common and are fairly hard to resolve.
Another approach is to use server-side composition, where a web application gathers content from various sources, both local and remote, integrates the content into a single HTML document, and serves the resulting composite document to the user. This approach is characteristic of large on-line retailers having a product catalog comprising an HTML document composed by hundreds of services that collect data and construct the page. Product details may come from one source, reviews from another, shopping card from a third, etc. This approach is complex and expensive to establish and maintain, and is also not suitable for use by many websites.
Another server-side composition approach is to use a web application that includes an API (e.g., a plug-in) and deploy a software component that can use the API to execute operations to compose an HTML document and serve it to a user. However, since each product will have its own API, deployment requirements and market dynamics, this is economical only for a few products that have a large base. The content of a website typically includes script tags which load a sequence of instructions that, when executed by a web browser retrieve contents from a content server and insert it into the document tree of the currently viewed page. JavaScript that executes in a web browser (client-side) is commonly used for a variety of applications for integrating third party content into existing web site, such as display ads, social buttons, rich content embedding, etc. However, automated user web agents (web crawlers, social networks, etc.) typically do not have the same ability to execute JavaScript as does a browser. Thus, while user agents may retrieve the HTML document with the script tags intact, it will be without any of the third party content. Accordingly, search engines cannot index the third party content or the meta-data so it cannot be used to affect search results, and social networks cannot access either the content or meta-data so it cannot be used to control what is shared. Moreover, even when a client-side approach such as a user agent is used to add content, it frequently creates formatting compatibility problems rendering the added content incompatible with the original content.
It is desirable to provide methods and systems that address the foregoing and other problems with known approaches by enabling easy, cost effective and seamless integration of third part content into existing websites such that it is compatible with the original content, accessible to user agents, and optimized for search engines use in indexing and retrieving content. It is to these ends that the invention is directed.