1. Field of the Invention
The present invention relates generally to modifying documents sent over a communications network, and in particular to a system and method for determining the information contents of document portions and replacing undesired document portions with substitute document portions or inserting substitute document portions.
2. Description of Prior Art
To a large degree, the information age has been brought about by rapid advances in the field of communications and communications networks in particular. Increasingly, information which could formerly be presented in tangible, permanent media is reformatted and rendered for display on screens and monitors. Virtually any information presentable as text or text and graphics is being converted into suitable electronic messages or packets for shuttling across a communications network.
A communications network, e.g., the Internet, has an architecture in which information packets from resources or content providers is made available through service providers to users who subscribe to the service. The actual transmission takes place over the communication links of various bandwidths and types which make up the network. Content providers typically store this electronic data on server machines connected directly to the Internet in standard format. The data is broken down into packets and these are then transmitted over the communication link. Among the diverse types of information which may be placed on the Internet in this way are articles, news briefs and updates, weather maps, books, summaries, files, software, catalogues, documents, pictorials, video files, public records, commercial literature and so forth.
Clearly, the number of packets which can be transmitted via a communications network is vast and varied. To aid in sorting, routing and transmitting information on the Internet the content of any given packet is usually identified by its origin (address of the content provider), a brief summary located in a conspicuous portion of the packet (e.g., in the header) or some other identification information. For example, the Internetwork Packet Exchange (IPX) protocol followed by NetWare routers, distributed by Novell, Inc., execute a so-called Routing Information Protocol (RIP) and Service Advertising Protocol (SAP). The RIP protocol involves periodic RIP broadcast packets containing all routing information known to the router. These packets are used to keep the global network synchronized. In addition, the protocol provides for periodically sending SAP broadcast packets containing all server information known to the SAP agent. Thus, the network system keeps track of the contents of the various packets to facilitate transfer, mitigate traffic problems and perform other vital operations.
In U.S. Pat. No. 5,530,852 issued to Meske, Jr. et al. the inventors disclose a method and system for receiving information in a first file written in a first markup language and identifying the information contents. The method and system ensure that even complex packets of information are processed by generating a list of profiles and topics for each list of the profiles. Secondary and tertiary files are created with anchors referencing particular information in the first file. A parsing procedure is taught by Meske to ascertain whether any information in the first file (original packet) is relevant. If so, fourth and fifth files containing the desired information are created and sent to the user.
Meske's system and method can be adapted to block or filter entire packets or portions thereof on a content-basis before performing the necessary steps to display the information--usually in the form of a page--on the user's screen. The document is later parsed to extract the profile and build additional pages to catalog and access the information. This method for building a knowledge base with embedded content profiles and in a document is useful but limited to processing the received information only.
The above-mentioned IPX protocol and similar methods which determine the information contents of packets and use them in the routing process can be employed to control the transfer of packets. For example, U.S. Pat. No. 5,541,911 issued to Nilakantan et al. discloses a remote smart filtering communication management system which uses the information contents data to alleviate network traffic problems.
In particular, Nilakantan controls the traffic across a communication link between a remote network and a central device by applying forwarding rules. The resources monitor the characteristics of the forwarded data packets received across the communication link to learn characteristics of the users of the remote network. In response to the learned characteristics, the resources generate link management messages and forward these to the remote interface. The remote link management resources in the remote interface are responsive to the link management messages and tailor the forwarding rules to the user characteristics. The packets can now be filtered or blocked based on user characteristics.
The use of selective blocking and filtering of packets by Nilakantan et al. is applied to ultimately reduce network traffic. The invention is centered around sending management messages which are then used to optimize packet traffic across given links in the network. In other words, the problem addressed by this invention is the high volume caused by the proliferation of packets on the network.
Blocking and filtering of packets or their parts can be employed to speed up the page rendering process on the user's screen. For example, blocking functions may restrict packets from a list of providers or an entire block of providers from ever being sent to the user. This feature allows one to prevent undesired packets (e.g., packets containing adult material) from being sent to the user and rendered on his or her screen. Filters can be preset to chose packets based on the time they require for rendering or in accordance with other user-specified standards (e.g., information contents). Proper application of these two functions results in an optimized and personalized page rendering procedure.
In the most common practical scenario, however, a network user sends a direct request for an entire document from a terminal located on his or her premises to the service provider. The provider verifies whether the document is already stored in local memory and, if not, obtains this document from the content provider. While the user's request is processed the service provider usually passes on to the user a number of unsolicited document portions, e.g., document portions from other service providers such as advertisement servers. Thus, the subscriber receives, in addition to the requested document(s), numerous other document portions of varying degrees of interest or importance to him or her. When the page is rendered on the user's screen these embedded document portions are displayed as a part of the document.
Under these circumstances, what is needed is a system and method for modifying or substituting undesired document portions rather than performing blocking and filtering functions on the packet level. For example, the service provider, the user or another party may wish to exchange or modify a document being sent to the user. This situation may occur when the service provider wishes to enclose vital information with the document requested by the user. The use of the bandwidth allocated to a less important document portion, hence a document portion swap, would be highly appropriate for this purpose. In another situation, the user may wish to block undesired document portions. For instance, when recording television programming on the VCR recorder the user can selectively block advertising material from being recorded. Analogously, when rendering a web page the user may wish to omit specific document portions from being rendered on the page.
At the present time the problems associated with this type of document modification have not been addressed, much less solved. Consequently, what is needed is a system and method which solves the problems associated with document modification based on the information contents in a communications network such as the Internet.