The present invention relates to a method for extracting digests, reformatting, and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation. More particularly, the invention relates to a system and method whereby a user selects a fragment of an online document shown in a source window and copies this fragment to the target window, the system creates a sequence of commands that can reproduce this behavior when applied to the new versions of the source documents downloaded from the information source, such as web site.
Structured online documents, especially HTML and XML documents available on the World Wide Web (WWW) have become very important in the past few years. Such documents contain data which may be periodically updated, wherein such updating does not substantially change the format of presentation of such data.
These online documents usually are dynamically generated by the web servers and they present data stored in online databases. This data periodically changes, but since these documents are automatically generated by computers, the presentation document structure remains substantially the same for relatively long periods of time. Additionally, even when the web page is updated manually, the presentation document structure may remain substantially the same for relatively long periods of time.
Examples of such frequently updated online documents include: stock quotes from brokerage web sites; prices of specific items from online commercial vendor sites and from online auction sites; local weather information from weather web sites; airline ticket information provided by airline or travel sites; shipment tracking information from the mail delivery companies; current news headlines from the news organizations web sites; latest press releases of a specific company issued on their web site; bank account balances for an individual or corporation from the bank web site.
While all this data may be of great interest to the user, it is often accompanied by data that is unimportant or even irrelevant to a particular user. This irrelevant data unnecessarily complicates comprehension and interpretation of the relevant data and often leads to the user missing important changes in the relevant data.
Examples of the data that may be unimportant to the user are:
1. Stock quotes for a stock of interest to the user are often accompanied by other data such as number of shares outstanding, opening and closing prices, earnings in the last quarter and so on. While the user may need to check this data once every 2 or 3 months, the user is not likely to want to see this data every time a current stock quote is sought.
2. Fluctuating price for an item in an online store that interests user may be accompanied with advertising for other items that the user has no interest in or it may be accompanied with product photographs which user has already seen many times.
3. Balances of the user""s bank accounts may appear in separate online documents (web pages) and be accompanied by the last 10 transactions. The user, however wants to monitor only balances of all his or her accounts in the bank so that every balance appears in a small window unaccompanied by any other information.
In addition to this, if the user wants to monitor important data, he or she will find it necessary to push the browser xe2x80x9cReloadxe2x80x9d button to obtain the latest data from the remote database. This requires considerable manual effort and can be fatiguing even when monitoring one online document. The manual effort required for monitoring several online documents simultaneously is so great that it makes such monitoring very difficult, if not impossible to do on a regular basis.
Summary. Online documents generated by online databases provide valuable data that a user may want to monitor. However, this essential information is often accompanied by large quantities of non-essential and even irrelevant information, or information that rarely changes and does not need to be monitored.
Therefore, a method is needed that allows a user to automate monitoring of essential data extracted from online documents while ignoring non-essential or irrelevant data.
In the remainder of this Section we present the state of the art in the technical area of this invention and show how this invention differs from the state of the art.
HTML, browsers, and DOM
HTML, and XML structured online documents are displayed using web browsers such as Navigator by Netscape(copyright) Communications and Internet Explorer by Microsoft(copyright) corporation.
A web browser is used in the preferred embodiment of the present invention.
However, none of the browsers known to us can display a document fragment in a separate window with no window treatments so that irrelevant information is not seen by the user and this window takes small space on user""s screen. Also none of the browsers known to us implement automatic refresh.
The present invention augments the browser behavior and it uses the ability of the more advanced browsers to be controlled by other applications. Also the present invention uses the Document Object Model (DOM) to navigate the content of an online document represented as a tree of nodes.
Web site server-side customizations
Most major websites allow limited server-side customization of their content. Examples are MyYahoo!(copyright) on the Yahoo!(copyright) website, My Netscape(copyright) on the Netscape(copyright) website, and the like. These customizations are nothing more than accounts created for users on these web sites. Users see the customized content when they login into their accounts on the web site.
Web site customizations provide a limited choice of what can be customized. For example, the user usually can select a portfolio of stocks to be displayed, but he or she usually cannot select what parameters are presented for a particular stock. Also usually such customizations are limited to very few online data categories. For instance, user can monitor all U.S. stock using such customization, but he or she cannot monitor, say, Brazilian stock even though online stock quotes for Brazilian stock may be available online.
Furthermore, creating user-customized web site content requires complicated and therefore expensive programming from the web site maintainers, so this option is not practical for smaller web sites because of its price and complexity.
Finally, server-customized web pages are still shown in a regular web browser window that has a lot of unnecessary window treatments and user is still required to push the xe2x80x9cReloadxe2x80x9d button every time he wants to update.
Using the present invention, the user can arbitrarily customize and monitor any web page content and select any presentation format for the customized content, and no programming is required both on web server side and on the user side.
Online data providers
Several online services exist that can push certain online data such as stock quotes to the user""s wired or wireless device such as pager or computer.
These services compare to the present invention in the same way as server-side web site customizations, because they have the same problems: limited choice of content that can be monitored, no way to arbitrarily customize presentation of such content and what parameters are included, expensive server-side programming is required.
XML and XSLT
Several techniques exist that transform a higher level abstract document presentation to the lower level document presentation used for rendering the document. Most notable effort in this area is XSLT language that is used to write programs that transform XML documents to HTML documents that are rendered in a web browser. More information about the XSLT language and XML documents can be found on the World Wide Web Consortium (W3C) website.
These techniques do not cover the present invention because they are used to synthesize lower level document presentation from the higher level document presentation but they do not change the content of the document. The present invention is primarily used to change the content of the document without changing the level of abstraction used in the document presentation.
Related Patents
U.S. Pat. No. 5,530,852 to Meske, Jr., teaches how to build web sites that store news articles and serve them to users through the Internet, providing categorization and search services. A typical news article is a structured document that has a title, summary (profile), and body. However, the U.S. Pat. No. 5,530,852 teaches processing news articles in the web server space, and not in the client space. Also the U.S. Pat. No. 5,530,852 teaches programming of reformatting by a highly skilled computer programmer, while the present invention teaches creation of reformatting script by non-programmer user.
U.S. Pat. No. 5,737,592 to Nguyen et al. teaches how to build server-side programs that receive queries from a web browser, automatically convert them to SQL queries, run these queries on a database, convert records returned by the database to HTML and send this HTML back to the requester. The present invention is different from this patent because it applies on the client side and not on the server side and we are not concerned with generation of SQL queries.
U.S. Pat. No. 5,745,754 to Lagarde et al. and U.S. Pat. No. 5,752,246 to Rogers et al. teach how to build server-side programs that use Distributed Integration Solution servers to perform extraction of data requested by a user from databases, and presentation of this data in HTML. These teachings would be of use to a highly-skilled programmer who programs web applications in extracting and reformatting data in a database. But they are different from the present invention, because we teach how non-programmer user can create reformatting scripts on the client side.
U.S. Pat. No. 5,774,123 to Matson teaches how to record a sequence of navigation commands performed by a user on the web browser and how to later replay these commands causing the browser to repeat the navigation session. The record-and-replay feature of this patent does not teach extracting digests of online documents, nor does this patent teach extracting document digests using document trees and displaying the digests in a separate window.
U.S. Pat. No. 5,799,304 to Miller teaches how a user agent can filter, i.e. wholly display or wholly reject, a news article based on criteria provided by the user. That is, it teaches how to make search engines more intelligent by using agent technologies. This patent does not relate to extraction of document digests.
U.S. Pat. No. 5,890,152 to Rapaport teaches how to build a web search engine that takes into account user characteristics such as IQ, etc., all stored in a personal profile database. This patent does not relate to the present invention, because we are not concerned with user characteristics at all.
U.S. Pat. Nos. 5,895,476 and 5,903,902 to Orr et al. are concerned with server side generation of online documents from the specialized higher level representations of documents. This is different from the present invention because the present invention applies on the client side and it does not change the transformed document""s level of abstraction.
Accordingly, it is a problem in the art to automatically monitor user-selected fragments of the online documents and to create scripts that perform such monitoring when such scripts are to be created visually by a user without requiring user to write a program of any kind.
From the foregoing, it is seen that it is a problem in the art to provide a device meeting the above requirements. According to the present invention, a device is provided which meets the aforementioned requirements and needs in the prior art.
Specifically, the device according to the present invention provides a method for extracting digests of structured online documents, and automatic monitoring of the said digests. A digest of an online document is a collection of fragments of this document which are of interest to a user. Creation of the scripts that perform the said digest extraction and monitoring employs visual programming of the online document tree navigation and transformation. The disclosed method can be applied to structured online documents such as HTML, XML, SGML documents, or to any other online document that has internal structure that can be represented by a tree.
More specifically, the system according to the present invention is based on a visual programming whereby a user selects a fragment of an online document shown in the source window and copies this fragment to the target window that contains the reformatted digest. The system according to the present invention generates a sequence of web site navigation commands, online document tree navigation commands, and xe2x80x9cCopy Fragmentxe2x80x9d commands that cause the assembly of the reformatted digest in the target window. The user can later ask the system to replay the sequence of generated commands, thus causing automatic creation of the reformatted digest of the changed version of the online document.
Therefore, according to the present invention, when content of the original document changes and the script that creates the digest is run, the change is automatically propagated to the digest document. This allows implementation of simple automatic monitoring of digests of the online documents which occurs entirely in the user space, that is in the application that controls the user""s browser.
The digest document is typically much smaller than the original document, and usually it does not contain computationally intensive and bandwidth intensive multimedia elements such as graphics, sounds, scripts, and controls. This considerably lowers the screen size, bandwidth and processing power requirements for user agents that display document digests. Therefore, documents digests can be displayed by user agents that run on wireless and portable computing devices. Such devices have small screen, and their bandwidth and computational power resources are limited.
The preferred embodiment of the present invention is a computer program that is called WebTransformer(trademark). It runs on Microsoft(copyright) Windows(copyright) 32-bit operating systems and as of filing date it controls the Microsoft Internet Explorer.
Vocabulary.
Source Document and Source Window. The source window typically contains a regular browser such as Microsoft Internet Explorer. In this window the source online document is shown. Used to navigate to the web page of interest and to select a fragment of this page to be monitored.
Target Document and Target Window. The target window is where the digest of the source document is displayed. The digest of the source document that user monitors is also called the target document. The target window is typically much smaller than the source window and it does not have window treatments such as menu bars and scroll bars, so that it is possible to have many such window on one screen.
Commandxe2x80x94Elementary instruction to perform operation on a document tree that can be recorded.
Scriptxe2x80x94A recorded or otherwise created sequence of commands.
How It Works
The user typically performs the following actions in order to use the present invention.
First, the user browses documents in the source window and when seeing a document of interest selects a fragment of the document that constitutes a digest. Selection is performed by clicking the desired element of the web page. This click is translated by the browser into the address of the node in the document tree that represents the minimal HTML element that covers the clicked area,
The user can then use the arrow keys of a computer keyboard to extend, contract, or move sideways the selection. Other selection mouse clicks and keyboard keys may be used depending on the web browser.
When the user finishes selecting the fragment, the user invokes the user interface xe2x80x9cCopyxe2x80x9d command that copies contents of the selected fragment from the source window to the target window. Please note that target window does not have to be visible when source document fragment is selected. The target window may become visible upon creation of the script. Similarly, source window may be not visible when the script is replayed.
In addition to that, according to the present invention the WebTransformer creates a script that records the source document location, sequence of document tree navigation commands that leads from the tree root to the node that corresponds to the selected fragment, and the xe2x80x9cCopy Fragmentxe2x80x9d command.
The system can record all elements of user navigation including entering User ID and Password or filling out and submitting other online forms that cause the desired navigation.
Finally, according to the present invention the user can ask the WebTransformer to run the script that has been created. The user can request a one-time execution of the script or automatic periodic execution of the script according to a user-specified time table. Script execution results in fresh (not from cache) download of the source document, navigating the source document tree to the selected tree node and copying the selected source document fragment to the target window.
Summary of Benefits
The present invention brings the following benefits to its user:
1. User views and monitors only the fragments of online documents that are of interest to him or her, not the whole documents.
2. User does not have to push the xe2x80x9cReloadxe2x80x9d button, it is done for him or her automatically by the WebTransformer.
3. Combination of typically small size of target windows and auto-refresh feature allows to monitor many (10-50) online documents simultaneously without applying any manual effort.
4. Since the document digest is small and it typically does not contain large pictures or embedded programs (such as JavaScript, Java, ActiveX programs), the document digests download and execute much faster than the original documents.
5. Since document digests are small in size, and since they require less bandwidth and less computational power to display than the original documents, the document digests can be successfully displayed on small-screen user agents that have bandwidth and computational power limitations, specifically on user agents that run on wireless devices such as cellular phones, pagers, wireless personal digital assistants (PDA), and so on. These devices"" primary limitation is screen size, so they would greatly benefit from the present invention.
Other objects and advantages of the present invention will be more readily apparent from the following detailed description when read in conjunction with the accompanying drawings.