The field of computer coding covers many different coding and program schemes all of which incorporate certain rules that define the particular coding scheme and which generally must be adhered to by users of such schemes in order for results to be derived therefrom. For example, a person programming in the language Pascal must ensure their code corresponds the appropriate Pascal syntax in order for the Pascal program to operate. Where code does not comply, a “syntax error” will occur upon compilation of the program.
As opposed to programming in a highly structured language, more recently it has become common to alter or create documents using so called “mark-up languages” to provide a mechanism by which content in a document is presented in a particular environment, usually upon a display screen or printing device. One example of this is the Hypertext Mark-up Language (HTML) and another is the Extended Mark-up Language (XML). The purpose of such mark-up languages is to provide additional notation to content desired to be displayed or presented, so as to cause the content to be displayed or presented in fashion desired by the author.
The HTML document format pervades the Internet and the World Wide Web. In practice, documents structured with HTML mark-up are often in error in that they do not comply the particular internationally recognised HTML standard operating at the time of document creation. The current HTML standard at he time of drafting this patent specification may be found at http://www.w3c.org and http://www.w3c.org/TR/REC-htm140/.
The current standard for HTML documents insists that such documents be expressed as trees. Such a structure requires that each element of the document must be wholly contained by another element and, as a consequence, elements may not overlap. The experience of many indicates that it is easy to produce a document that superficially looks like HTML, but which in fact violates the tree-like hierarchical structures established by the HTML standard. Further, whilst human interpretation of such erroneous documents can often resolve ambiguities, there is often a mismatch between what makes sense according to the current standard, and that which the author of the HTML document actually intends.
Computer applications which read HTML approach such problems in a number of different ways. Some applications reject the bad HTML structure, thereby omitting the content or rendering the content in non-intuitive ways. Examples of these include “OPERA” and a number of smaller distribution Internet browsers which are preferred by some users for much more strict behaviours. Other applications try to match the user's likely intention despite of the strict errors contained in the HTML source. Examples of these include “Internet Explorer” (trade mark) manufactured by Microsoft Corporation, “Netscape Navigator” (trade mark) manufactured by Netscape Corporation, and “WebRecord” (trade mark) marketed by Canon Inc. In spite of the reasonable efforts the present inventors to determine how Internet Explorer and Netscape Navigator handle variations away from strict HTML, the present inventors have not been able to determine how those products perform in a manner so as to apparently resolve ambiguous or erroneous HTML.
A significant problem that arises from such non-compliance with the HTML standard is that there exist other languages and tools which interact with HTML documents, for example scripting language like JAVA script and styling languages like CSS2 (Cascading Style Sheet 2). Such tools expect that there is a strict tree structure in an HTML document and, as a consequence, often have no defined behaviour when interpreting a poorly formed HTML document. As a consequence an authored result cannot be guaranteed.