Web pages, which are typically written in hypertext markup language (HTML), frequently include URLs that refer to the location of objects referenced in the Web pages. For example, an image in a Web page might be referenced by a URL or hyperlink that provides the address or path to the image where it is stored on a server accessible over the Internet or other network. When the Web page is retrieved by a client from a server for loading and display with a browser program running on the client, the image is retrieved over the network from the storage location to which the URL points.
Referencing a URL in a tag included in the HTML or other markup language used to define a Web page document is a straightforward process. For example, an image of a rose included in a file “rose.jpg” that is stored in a folder “flowers” on a server “myserver.com” can be referenced by including the following tag in the HTML defining the Web page:
<p><img border=“0” src=“http://www.myserver.com/flowers/rose.jpg”></p>.
A URL that references an object in an HTML document within a tag is readily identifiable so that it can be employed to retrieve the object from the indicated storage address for the object, for use by a browser program. Also, it may be necessary to update the URL if the storage location at which the object referenced has been changed, to fix a broken link to the object. A commonly assigned, copending patent application, U.S. Ser. No. 09/285,530, entitled “METHOD FOR PRESERVING REFERENTIAL INTEGRITY WITHIN WEB SITES,” which was filed Apr. 2, 1999, discloses a method for automatically updating or fixing URLs (or hyperlinks) referencing Web pages or objects that have been moved to a different storage location so that the links are correct; the disclosure and drawings of this commonly assigned, copending application are hereby specifically incorporated herein by reference.
The technique for fixing links that have been broken due to changes in the storage address of an object will fail if the URL or hyperlink to a page or object that is referenced in an HTML document cannot be determined. HTML documents often include “event handlers,” or attributes on an HTML object or element that contain script called during a specific event, such as when a mouse or other pointing device is clicked on the object. Script frequently includes URLs that cannot be fixed using the hyperlink fix up mechanism noted above. The invention disclosed in the above-referenced application is currently included as part of the server extensions associated with Microsoft Corporation's FRONTPAGE™ Web site creation and maintenance program.
The most popular scripting language used for event handlers on a Web page is ECMAScript, which conforms to the European Computer Manufacturers Association (ECMA) specification for script. ECMAScript is an implementation of the JavaScript and Jscript languages. Because URLs in the ECMAScript portion of an HTML document cannot be recognized using conventional techniques, means must be provided to facilitate the identification of such URLs and addresses. It might seem trivial to apply a heuristic to solving this problem, such as assuming that any string containing “http://” is a URL, but that assumption is incorrect. For example, a script might include a function that refers to a URL for an image file in the following manner:
myfunc(“images\image1.gif”, 1, “one”).
Note that this script function does not include “http://”, and therefore, the trivial heuristic approach mentioned above would be unable to recognize the path or address to “image1.gif” as a URL. Accordingly, if the location at which the file “image1.gif” was stored is changed to a different folder, the address or URL provided in the script for “myfunc” will be broken (unless manually corrected) and not fixable using the automated fix up capability of the FRONTPAGE program server extensions. Accordingly, a different approach is required to enable a URL referenced in a ECMAScript portion of a Web page to be detected, so that it can be fixed or put to other uses.