The World Wide Web is a collection of servers connected to the Internet that utilize the Hypertext Transfer Protocol ("HTTP"). HTTP is a known application protocol that provides users with access to documents (e.g., web pages) written in a standard mark-up page description language known as Hypertext Markup Language ("HTML"). HTTP is used to transmit HTML web pages between a remote computer (e.g., a server) and a local computer in a form that is understandable to browser software (e.g., Netscape Navigator.TM., available from Netscape Communications Corporation of Mountain View, Calif.) executing on the local computer.
Among a number of basic document formatting functions, HTML enables software developers to specify graphical pointers (commonly referred to as "hyperlinks") on displayed web pages ("base web pages") that point to other web pages ("remote web pages") typically resident on remote servers. Once the remote web page is displayed, a user of a local computer system may freely review its contents and perform any functions that it provides. One such function, for example, may be obtaining specified data ("data") from the remote site. After the data is retrieved, it may be displayed by the local computer system in a selected format specified by the remote web page. Problems may arise, however, when utilizing such web page function. Primarily, access to the data through the remote web page interface may be cumbersome and thus, not intuitive to the user. Accordingly, the user may not be able to retrieve the desired data from the remote site. Similarly, even if the data is retrieved from the remote site, its display in the selected format also may be cumbersome and thus, not in a form that is easily understood by the user.
The art has responded to these and similar problems by enabling a base web site to automatically extract data from a remote web page, and then display the retrieved data in a format specified by the base web site. Accordingly, the base web site, and not the user, accesses the remote page to retrieve the data. A typical process that may be used for retrieving and displaying such data may begin when a user requests the data while accessing a base web page. In response, the base web site directs a data request to the remote site requesting the data. After retrieving the request, the remote site typically generates a response web page having the data. The response web page then is directed to the base web page for processing.
Instead of displaying the response web page which, undesirably, is in a form specified by the remote site, the base site executes a specially designed scanning procedure that scans the response web page for the data. Once the data is retrieved from the response by the scanning procedure, it may be displayed, via the base web page, in a format that is designed specially by the base web page.
As noted above, the scanning procedure is specially designed to retrieve the data from the remote web page. Such scanning procedure is implemented by writing an application program that utilizes either conventional procedural or object oriented programming techniques. To be effective, such program must be preconfigured with the location of the data to be retrieved within the remote web page. Accordingly, a new scanning application program must be written each time the format of a response web page is modified. Developing such new scanning application programs are very time consuming, however, thus adding to the overall cost of developing and maintaining the base web site.
It therefore would be desirable to have a method and apparatus that enables a base web page to efficiently retrieve information from a remotely linked web site without requiring that a scanning program be developed.