1. Field of the Invention
The present invention relates to a system and method for establishing and implementing user defined virtual tags which can be used to mark items of an original electronic document that the user is interested in displaying and creating a customized document which can be updated from the virtual tags and extraction rules used for implementing the virtual tags.
2. Description of the Related Art
The World Wide Web (WWW) is a collection of documents determined as Web pages resident on computers that are distributed over the Internet. Web pages are typically defined in Hypertext Mark-up Language (HTML). Multiple Web pages are sometimes linked together to form a Web site, which can be a collection of Web pages directed to a particular topic or theme.
Web pages often contain a vast amount of information which is much more than a user needs. However access to data residing on individual Web pages is hindered by the fact that there is no defined structure for organizing information on a Web page. Also it is difficult to determine the Web page scheme as it is buried in underlying HTML code. A further difficulty arises in that a similar visual effect as defined by the Web page scheme can be achieved with different HTML features such as HTML tables, ordered lists or HTML tagging.
Conventional proxy servers retrieve Web pages and syntactically transform them to better present their content on devices other than those intended to view those pages. U.S. Pat. No. 5,918,013 describes a method of transcoding Web documents in a network environment. A proxy server including a persistent document database which stores various attributes of all Web documents previously retained in a response to a request from the client. When a Web document is retrieved from a remote server in response to a request from the client, the database is consulted and the stored information related to the requested document is used by the proxy server to transcode the document. The document is transcoded to circumvent bugs found in the Web document, to size the document for display on a television set, to improve transmission efficiency of the document and to reduce latency. However, these proxy servers work purely by translating the page content into a more appropriate form. Accordingly, the systems are device driven rather than user driven.
Style sheets are used to set a style for a Web page or multiple Web pages. Style sheets provide information separate from the content of the page they reference. Accordingly, style sheets add functional display information to conventional tags physically present in a Web page.
Techniques have been described for extracting content from Web pages. U.S. Pat. No. 5,913,214 describes a system for extracting data from Web pages to be used to augment a traditional structured database. A user query is converted to a set of commands to interact with content of a Web page. A data retriever receives content from the Web page and translates the data from the data content of the Web page into a data content associated with the initial request.
U.S. Pat. No. 6,128,655 describes a method for recasting web content on a hosting site. The invention provides an automated system for replicating published web content and associated advertisements in the context of a hosting web site. At the hosting web site, the invention includes the process of brokering a client browser's request for a web page, analyzing the returned content and splitting it into component elements, extracting the desired component elements, recasting the desired elements in the look and feel of the hosting site and sending the recast content to the requesting client as a web page. Once the reformatted file is received at the client, the client browser interprets the HTML in the web page, presenting the content in the context of the hosting web site. The component original page is parsed into desired content elements using a filter definition. A filter designer determines items to be used in a recast page. The filter definition is used to break the content into component parts such as title area, primary and secondary advertisements and the content itself. The filter definitions can be created by the filter with analysis of the HTML source code, imbedded comments or delineators and through comparisons with similar documents. This method would be difficult to use with custom user modifications and on a dynamic Web page since a filter designer apart from the user is required to develop a filter for each modification of a user.
It is desirable to delimit and annotate information in a Web page by user interaction in order to allow portions of the Web pages to be identified for dynamic independent retrieval to provide a customized Web page layout.