Loading and processing a web page is becoming increasingly complicated, unpredictable and even never-ending. In the conventional approach, a single HTML-document often constitutes a complete web page, and browsers only need to load, parse and render this single document. When a browser reaches the concluding </HTML>-tag, processing is complete, and the browser may then complete or remove its progress bar indicator, indicating to a user that the page finished loading and is ready for interaction. Even when conventional approaches employ sub resources such as, e.g., images referred to in the main resource (e.g. the HTML-document), these are loaded, parsed and rendered upon discovery of the sub resources by the typical browser. Moreover, in conventional systems this loading is performed all within the single-string, sequential, synchronous processing of the web page source information (e.g. the HTML-document).
Since then, several other kinds of sub resources have emerged, including scripts and cascading style sheets (CSS). These and other modem web page elements enable a web designer to make use of asynchronous or deferred processing, meaning that even when the browser has reached the end of the main resource, and parsed and rendered all loaded sub resources, it may continue processing the web page. Deferred or timed scripts and CSS transitions may still be in queue for processing, timers may not have fired yet or be in the midst of a repeating loop, etc. It has accordingly become very common for conventional websites to keep scripts or other asynchronous tasks running as long as the web page is shown in a browser window, e.g. for updating content on the website dynamically.
Some deferred or asynchronous tasks may change the semantics of the web page, i.e. the content, e.g. news updates, or the presentation of content, e.g. order, visibility, emphasis, etc., thus triggering a new rendering of the web-page and precipitating a change perceivable by the user. Also, some deferred or asynchronous tasks may be intended by the web designer and/or perceived by a human user as part of the loading and/or initialization (e.g. during a first processing iteration) of the web page, whereas other deferred tasks would be experienced as operational processes for enhancing the experience of exploring or using the web page.
These circumstances make determining when a browser has finished processing a web page practically impossible according to conventional techniques. Even the progress bars or other “busy”-indications of the typical browsers often do not correspond to a user's perception or idea of when a web page rendering is in such a state of completeness that the user can assume the expected information to be present or begin interacting with the web page.
Programmatically seen, for example from a web crawler or other robot's view, the need to be able to determine when to consider the semantics of the page complete or when to proceed forward with automatic interaction is very important. Web robots or automated browsing scripts have far less sense or feeling of when the web page is ready, and what operational tasks to disregard, than a human user has.
The common solution to the problem has been to make robots, scripts, etc., simply wait a predefined amount of time before proceeding. This is not a very efficient way of determining when a browser is done, as there is very large variation in how long it takes for the browser to become ready. Thus, there is often unnecessary waiting, or alternatively, insufficient waiting. Furthermore, it is tedious for the programmer writing the automation script or robot to have to guess the necessary or relevant amount of time for each task.
Using pauses of predetermined length also disregards the fact that the time for performing a given action on a specific website is not constant, but depending on the user client software and hardware, network latencies, etc. Depending on the acceptable fault rate, the web robot may be held waiting for periods in the amount of e.g. 5-10 seconds, or even longer, amounting to, e.g., 1 minute, if a high probability that the browser actually having finished processing during the waiting is desired.
Accordingly, it would be beneficial to provide systems, techniques, and computer program products configured to perform improved, efficient, and adaptive ways of determining a web page processing state.