Fast, inexpensive distribution of information has been promoted by the widespread acceptance of the Internet and especially the world-wide-web (www). This information can be easily updated or changed. However, users may not be aware of the changes. Unless the user frequently re-reads the information, many days or weeks may pass before users realize that the information has changed.
Documents on the web are known as web pages. These web pages are frequently changed. Users often wish to know when changes are made to certain web pages. The parent application disclosed a change-detection tool that allows users to register web pages. Each registered web page is periodically fetched and compared to a stored checksum or signature for the registered page to determine if a change has occurred. When a change is detected, the user is notified by e-mail. The change-detection tool of the parent application allows user to select portions of a web-page document for change detection while other portions are ignored.
Such a change-detection tool as described in detail in the parent application is indeed useful and has gained popularity with Internet users, as several hundred thousand web pages have been registered. For example, patent professionals can register the federal regulations and procedures (37 C.F.R. and the M.P.E.P) posted at the PTO's web site and be notified when any changes are made. The change-detection tool is currently free for public use at the www.netmind.com web site.
FIG. 1 illustrates a web page registered for change detection. This web page contains a copy of one or more of the code of federal regulations; specifically the patent office regulations at 37 C.F.R .sctn. 1.x. A patent attorney registers this web page that contains a copy of the patent rules at 37 C.F.R. .sctn. 1.8 to 1.136. The rules may be located on one large web page, or spread across many web pages that are each registered.
The user registers this page by using a user-interface for the change-detection tool. The user enters his e-mail address and the URL for the web page. The change-detection tool fetches a copy of this page and generates a signature. The signature is a highly-condensed data word that is produced by using a cyclical-redundancy-check (CRC) or other algorithm that produces unique outputs. For the initial page of FIG. 1, the signature 5A7 (hex) is generated and stored in a database with the user's e-mail address and the web page's URL.
The change-detection tool periodically fetches this web page to see if a change has occurred. A new signature is generated for the re-fetched page, and the new signature is compared with the old signature stored in the database. A mismatch indicates that a change is detected.
FIG. 2 shows an updated web page that has a different signature that triggers a change notification. Occasionally, the patent regulations are updated. Web pages containing a copy of these regulations are eventually updated to reflect the changed rules. For example, FIG. 2 shows that rule 37 C.F.R. .sctn. 1.62 has been deleted while rule 37 C.F.R. .sctn. 1.136 has been updated, as they were in late 1997.
The change detection tool re-fetches each registered page every few hours or days. Once the rules on the web page are updated, a different signature is generated for the updated web page. In FIG. 2, the new signature of D6F is generated, which does not match the old signature of 5A7 stored in the change-detection tool's database. Thus a change is detected. The new signature is stored in the database and the patent attorney user is notified by e-mail.
The user is notified within a few days after the web page is updated, allowing the patent attorney to rest easy, not having to frequently surf over to the rules page to see if any changes have been made.
False Change Detections--FIG. 3
The change-detection tool is only useful when it saves time and effort for the user. One problem is that false notifications can be made, annoying the user with changes that are not relevant. The inventors have discovered that the world-wide-web itself can trigger false change detections. These false detections should be filtered out.
FIG. 3 shows a false change detection caused by a non-relevant change in an Internet server. Web pages are stored on computer servers. These servers are sometimes disconnected from the Internet for maintenance such as program or hardware updates, or security threats such as hacker attacks.
The web server containing the web page with the 37 C.F.R. patent rules is disconnected from the Internet for maintenance. Often such maintenance occurs during low-usage times such as weekend nights. Most users do not notice that the web pages are offline during these hours. Unfortunately, automated software programs such as the change-detection tool continue to operate during these times, and may perform more fetching during off hours since network response times decrease. The change-detection tool may find that the web page is not available.
When no connection can be made with the server, the change-detection tool can simply skip the web page until a later time. Since TCP/IP packets are not returned from the server, the change-detection tool can easily determine that the page is not available due to a network problem. The change-detection tool does not notify the user, but instead tries again later.
Completely disconnecting servers from the Internet is frowned upon since users do not know what is causing the errors. Thus many web sites use another server to return a message page to the user when the server is down for maintenance. This message or error page lets the user know that the web page is only temporarily unavailable and the user should try back later.
The error page of FIG. 3 is returned when a user tries to retrieve the web page containing the 37 C.F.R. patent rules. This same error page is returned to change-detection software trying to fetch the web page. However, since no packet or network error is signaled, the change-detection tool assumes that the error page is the registered web page and generates a new signature. The new signature for the error page is EB9, which does not match the old signature (D6F) that was stored in the database after the last change was detected.
The change-detection tool then generates a change notice that is emailed to the user. The next day when the patent attorney reads the change notice, he browses over to the web page. By now the server is back up, showing the same web page as in FIG. 2. Although the user reads the web page carefully, he cannot find any changes.
A few days later, the change detection tool again retrieves the web page and generates the new signature. Since this new signature does not match the error page's signature that was stored, another change notice is generated. The user again looks at the web page but finds no changes. At this point, after receiving to false change notices, the user cancels his change-detection service to avoid getting the false notifications.
HTML Headers--FIG. 4
FIG. 4 shows a dynamic web page with HTML headers. A content-length HTML header &lt;CONTENT.sub.-- LEN&gt; specifies the length of the web-page document in bytes. A last-modified header &lt;LAST.sub.-- MODIFIED&gt; contains a date and time of the last modification of the web page. Dynamic content 15 is frequently updated, often by a database or search-engine server. Stock quotes are an example of dynamic content that appears in a dynamic frame. Dynamic images or JAVA programs are often used as dynamic content.
Some change-detection software relies solely on the last-modified header in the HTTP response from a Web server. For example, Microsoft Internet Explorer 4.0 has a feature called "Subscriptions" under the "Favorites" menu, which detects changes in web pages. This feature relies on the last-modified header to determine when a web page has changed. Unfortunately, many web pages do not return a last-modified header, and Internet Explorer generates false change notifications each time it checks a web page lacking the last-modified header.
Not all documents contain a last-modified header. The last-modified header may or may not reflect changes in dynamic content 15. Some web servers update the last-modified header only when the static content changes. Thus change notifications are not generated when the dynamic content changes. This may be undesirable when the dynamic content is what the user desires to have checked. For example, when the user wants to search newsgroups for the appearance of a specific product or company name, the result of the search is dynamic content. If the web server does not return a Last-Modified header, the user is notified by an unsophisticated change-detection tool every time the search result is checked. If the web server returns a Last-Modified header based only on the static content, the user is not notified when the results of the search--the dynamic content--changes.
The last-modified header may also be updated when the HTML header are changed, but not the visible document. This can also cause false changes to be reported. Even if the change detection tool is intelligent enough to analyze the content for changes, rather than relying solely on the Last-Modified header, false changes can be reported when the server returns only a portion of the web page due to some kind of error. The inventors, with the benefit of the experience involved in running a change detection tool for hundreds of thousands of different documents on the Internet, have recognized these problems. Without this level of experience these problems are not easily recognized.
What is desired is an improved automated change-detection tool that detects when changes occur to a registered document on the Internet. It is desired that the user not have to check the web page to see if any changes have occurred. A change-detection tool adapted to filter out false change notifications desired. A change-detection tool that does not report changes that are not relevant to the user is desirable. Identification of temporary error pages is desirable so that they are not reported to the user. A more sophisticated and more robust change-detection tool is desired.