The present invention generally relates to hyper-text markup language (HTML), and more particularly, to tags utilized in HTML documents.
The Internet is a collection of interconnected computers that share common communication protocols and languages. These protocols include the hyper-text transfer protocol (HTTP), which is a communication protocol used for communications between clients and servers, and the-transfer control protocol/Internet protocol (TCP/IP), the TCP portion of which is the transport specific protocol for the communication between computers or applications. In addition, the language in which these computers communicate is called hyper-text markup language (HTML). The explosive growth and popularity of the Internet, and more particularly the World Wide Web (hereafter the xe2x80x9cWebxe2x80x9d), in recent years may be due, at least in part, to the standardization of these communication protocols and languages. Moreover, the growth of the Internet has been aided by the fact that these communication protocols and languages are machine independent, and thereby, allow virtually any conventional computer (e.g., PC, Macintosh(copyright) (Apple Computer, Inc.), or UNIX(copyright)-based (American Telephone and Telegraph Company)) to be connected to the Internet.
HTML is a markup language that is used to describe the content and format of web pages. A web page (also referred to as an HTML document) is typically an ASCII text document comprising text and embedded HTML formatting commands referred to as tags. A web browser application, such as WebExplorer(copyright) (IBM Corporation) or Netscape Navigator(copyright) (Netscape Communication Corporation), parses the HTML tags in order to generate an integrated visual display of the web page. In addition to the tags which format the text, a web page can also include reference tags, in terms of a universal resource locator (URL), to a piece of multimedia data, for example, an image, video segment, animation, or audio file. The web browser responds to such a reference tag by retrieving and then displaying or playing the data as an integrated part of the web page. A tag can also create a hyperlink which is a segment of text or an image that refers to another document (e.g., a web page, image, video segment, animation, or audio file) elsewhere on the Web. When the hyperlink is selected, the referenced document is downloaded by the web browser.
An HTML tag typically begins with a left-angle bracket (xe2x80x9c less than xe2x80x9d) and ends with a right-angle bracket (xe2x80x9c greater than xe2x80x9d). Inside the brackets are a tag name (or identifier) and attributes, if appropriate. The tag name usually defines the tag by function and the attributes provide any necessary parameters for the operation of the tag function. For example, the tag xe2x80x9c less than P greater than xe2x80x9d which defines a paragraph break merely includes the tag name P and no attributes. As another example, a header tag which defines the size and indentation of a heading on the web page can include a start tag such as xe2x80x9c less than h2 align=center greater than xe2x80x9d and an end tag such as xe2x80x9c less than /h2 greater than xe2x80x9d. The start tag and the end tag are placed on either side of a text element, i.e., the text comprising the heading. Therefore, the heading of this section of the present patent application would appear as follows in HTML:  less than h2 align=center greater than BACKGROUND OF THE INVENTION  less than /h greater than . It is noted that the start tag includes the attribute xe2x80x9calign=xe2x80x9d to designate the alignment of the text element. Therefore, as illustrated above, some tags require an end tag while other tags do not.
While the tags described above are embedded into a web page to impart some type of functionality, other tags are declarative in nature. They merely convey information and are not visible when the web page is displayed by a web browser. One example of such tag is a META tag which is used in the head area of an HTML document to specify information about the HTML document for use in identifying, indexing, and cataloging the web page. Another example of a tag that is declarative in nature is the comment tag. The comment tag is identified by an exclamation point and appears as xe2x80x9c less than ! comment greater than xe2x80x9d. Another example includes applet tags (xe2x80x9c less than appl greater than xe2x80x9d) for embedding Java applets.
Important to the success of HTML is the Worldwide Web Consortium (W3C) which issues standardized versions of HTML in an effort to facilitate its continued evolution. Accordingly, HTML has been able to meet the demands of the ever changing and expanding Web. However, the utilization of HTML documents has generally been limited to the Web, or at least to web browsers. The utilization of HTML documents outside the context of a web browser has generally been limited. Even though more application programs are beginning to incorporate HTML parsers so that they can operate on HTML documents, the features of HTML are still generally being utilized for formatting text and creating hyperlinks, as in the context of a web browser. Thus, present applications, which operate on HTML documents, view documents utilizing the WWW paradigm of an HTML document as self-contained in that the information to create the document is either transmitted with the document or provided from the source of the document. The present applications generally do not address the need for working or interacting with an HTML document outside of an embedded browser environment.
In view of the above discussion, it is an object of the present invention to relate stored information to an HTML document.
Another object of the present invention is to reduce the processing overhead of an application program operating on an HTML document by relating stored information that defines an aspect of the HTML document to the HTML document for use by the application program in operating on the HTML document.
Yet another object of the present invention is to provide reduced transmission overhead by storing information locally that is related to an HTML document so that the information does not have to be transmitted with the HTML document over an external communication link.
These and other objects are accomplished, according to the present invention, by systems, methods and computer program products configured for relating an HTML document to stored information that is associated with the HTML document and with an application program, wherein the application program retrieves the stored information for use in performing an operation on the HTML document. An advantageous implementation of the present invention is for storing information locally that is document and application specific so that the information does not have to be recomputed for each invocation of the application program. Yet another advantageous implementation of the present invention is for storing information locally so that the information does not have to be transmitted with the HTML document between computer systems over external communication links. Another advantageous implementation of the present invention is for storing the state of a document (i.e., whether it has been altered) without revealing that the state has been stored.
An embodiment of the present invention for relating an HTML document to stored information that is associated with the HTML document and with an application program includes searching the HTML document for an association tag, wherein the association tag includes an index reference that can be utilized to locate the stored information. In addition, the index reference may be read and the stored information retrieved utilizing the index reference. The retrieved information can then be utilized by the application program.
In a further embodiment of the present invention, the stored information is maintained on a local device accessible by the application program. This eliminates the need of having to transmit the stored information with the HTML document thereby reducing transmission overhead. This also enables information that is particular to a version of the HTML document as it existed when it was last operated on by the application program to remain locally stored and associated with the HTML document.
In a particular embodiment of the present invention, the search for the association tag includes searching for a marker identifying the association tag as being associated with the application program. This unique marker identifying the association tag may ensure faster and more reliable identification of the association tag. In addition, the search may include searching for delimiters and then searching for a marker within the delimiters in order to reduce the likelihood of a false identification of an association tag based upon text not within a tag.
In another aspect of the present invention, the index reference may be modified. The index reference may be modified in order to relocate the stored information locally or, if desired, remotely. In addition, the stored information may also be modified. The stored information may be modified to account for operations performed by the application program. The modifications of the index reference and stored information are preferably performed by the application program.
In still another embodiment of the present invention, the index reference includes reading a filename of a file, an identifier of a memory segment of the file, and an offset into the memory segment. Furthermore, the retrieval of the stored information may include the retrieval of a checksum value, retrieving a canonical form of the HTML document, or a document validation key.
In a further embodiment of the present invention, an association tag may also be embedded in an HTML document. This is typically performed during the first invocation of the application program when operating on the HTML document, wherein the application program initially generates or computes the information. In addition, if the HTML document is transmitted by a first user to a second user that is remotely located, an additional association tag(s) specific to the first user may be embedded into the HTML document.
Other features and advantages of the present invention will become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional features and advantages be included herein within the scope of the present invention, as defined in the appended claims. Further, as will be appreciated by those of skill in the art, the above described methods of the invention may be provided as apparatus or computer readable program means.