The World Wide Web originally evolved primarily as a large collection of static web pages. Static web pages are pages that a server delivers to a user's web browser exactly as stored. Search engines are software systems that allow a user to search for information on the Internet. Search engines, for example, periodically update their index of web pages stored across the Internet, and/or update their web content (e.g., for faster retrieval and delivery to a user's web browser). Often, they do this using web crawlers or spiders. Search engine crawlers are software applications that systematically browse Internet sites in automated fashion to update the search engine's index or database of web content. For example, a search engine crawler may download a copy of a static web page, and then the search engine may process the static web page to update its search index. Search engine crawlers were originally designed to crawl static HTML content of web pages.
The inventor here has recognized several technical problems with such conventional systems, as explained below. As the web has evolved, so have the technologies powering websites, as well as the mechanisms of delivering and presenting web content to a user's web browser. For example, AJAX, short for “Asynchronous JavaScript+XML,” allows for asynchronous operations on the web (some implementations use the JSON data format instead of XML). For example, with AJAX, a web application running in a user's web browser can send and retrieve data from a web server asynchronously from the display and behavior of the existing web page. Thus, unlike with static web pages, where to change any of the content displayed to a user an entirely new static web page must be loaded into the web browser, with AJAX and web applications, the content (or view) displayed to a user can be changed without the need to reload an entirely new page. Single page web applications can take this concept to where only one HTML page is loaded into the web browser, and this page is fed partial views asynchronously. For example, when using the AngularJS front-end web application framework, a single-page web application may be running within the user's web browser, and all data-binding, routing, and application logic may be done on the client-side.
But because search engine crawlers are designed to operate on static HTML content, not dynamic single-page web applications, they are unable to accurately crawl content from websites that deliver web content through such single page applications. For example, in such an instance, the search engine crawler may simply download a blank start page, which the search engine cannot use for indexing the dynamically served web content available at the website.
A rudimentary solution to this problem involves running an application (e.g., Prerender.io) on the web server that takes the server's dynamically served web content and converts it en masse into individual cached static pages, which are then delivered in response to search engine crawler requests. This approach creates at least two problems of its own. First, the rendering application takes up a large amount of time and server resources to convert the server's dynamically served web content en masse into individual cached static web pages. Second, if the server's content itself changes dynamically, for example, faster than the rendering application can convert the content into cached static web pages, then the cached static web pages do not accurately represent the dynamically served web content of the server. Other problems include delay in caching a large amount of dynamically served web content, and costs associated with such rendering services.