A website on the World Wide Web can provide a variety of content to a user in various media formats. The challenge of providing such content to an end user increases as screen size is reduced (e.g., on a mobile device) because there is no reliable way to determine, from automated analysis of HTML, code, which parts of a web page must be retained and displayed to a user, and which parts of the web page need not be rendered.
One current approach utilizes the Document Object Model (DOM), a platform- and language neutral programming interface for HTML and XML, specified by the W3C, to generate a tree of nodes representing the structural components of the web page. Based on this DOM tree, the web page is split into smaller blocks, which are subsequently selected for display or discarded, depending on automatically assigned degrees of importance. One problem with this approach is its sole reliance on the analysis of the document structure, which may result in rankings of the various page portions that do not properly reflect content relevancy.
An additional challenge consists in the need to reflect updated web page content in the corresponding reduced-screen version of the page. Rather than re-analyzing the entire web page, it would be desirable to retrieve only updates of the components selected for display. However, localizing these portions is not trivial. One approach to this problem involves loading the document into a byte array and grabbing content at the desired array positions. This solution is impeded by updates preceding the desired content in the web page, which can shift the respective start and end points of the desired portion. Another method involves parsing the HTML source code for specific keywords, IDs, or other regular expressions in comments or tags. This approach is of limited applicability since not all web sites are able to clearly identify desired regions in this manner.
Accordingly, there is a need for systems and methods that can improve the processing of web pages for display on devices with reduced screen size.