The present invention relates to computer networks used for transmitting and distributing documents in the form of a collection of digital data. More specifically, the present invention relates to a method and apparatus for dynamically and intelligently caching documents at a local site utilizing an agent to filter incoming multicast streams of documents.
The Internet has become an expansive backbone of communication with a vast repository of information available in various formats. The reason for its popularity is due in large part to the development of the hypertext mark-up language (HTML), its related formats such as extensible mark-up language (XML),. dynamic HTML (DHTML), and the associated Transmission Control Protocol/Internet Protocol (TCP/IP) based communications protocols Hyper Text Transfer Protocol, File Transfer Protocol (FTP), and User Datagram Protocol (UDP). HTML has transformed the Internet from the black and white world of text into the vibrant multi-media environment that it is today. HTML defines the syntax and placement of special, embedded directions that instruct a web browser on how to display the contents of a document which is made up of one or more HTML text files which in turn reference associated media files, like Joint Photographic Experts Group (JPEG) graphics, Graphics Interchange Format (GIF) animation files, or other embedded files of any number of formats including text, images, and other support media. HTML instructs a web browser client application on how to make a document interactive through special hypertext links or though embedded programs like Java applets, which connect a document with other documents, as well as with other Internet resources. In addition, embedded programs can and often do contain their own interactive logic in the form of executable code and the associated resources. HTML and similar/related technologies are responsible for unifying pictures, sounds, and text in a document allowing the presentation of rich, multi-media filled web pages on a wide variety of computer based display devices and appliances that have propelled the Internet as a new medium for worldwide information exchange and commerce.
The full potential of the Internet as a medium for communication, education, entertainment and commerce remains unfulfilled due to problems with its performance and reliability. The Internet""s performance limitations stem from its basic architecture, which is not optimized for distribution of data-intensive multimedia content. Internet performance is currently limited by the weakest link in the chain between the client and the server. Bottlenecks may be caused by the xe2x80x9clast-milexe2x80x9d connection to the user, the infrastructure of the Internet Service Provider (ISP), the gateway to the Internet backbone, or the content provider""s Web server. For example, the Internet frequently becomes overloaded when transmitting the same data streams from popular Web site servers to millions of individual users.
It is generally true that wide area network (WAN) connections cost more and are slower than local area network (LAN) connections. What is needed is an effective method and apparatus for minimizing or eliminating redundant document transmissions across wide area network connections. Document transmission may include the transmission of data files or a collection of data files. It may include the transmission of text, audio, media, embedded programs, executable code, or other data that is published at a host server. In a wide area network of geographically dispersed homogeneous information consumers, the need for such an invention is most apparent. This is even more so when the users are consuming high-bandwidth data like video or audio files.
A method and apparatus for dynamically filtering documents transmitted on one or more multicast channels according to a first embodiment of the present invention is disclosed. A document is received off of a multicast channel. It is determined whether the document includes relevant information. The document is processed if the document includes relevant information.
A filtering agent according to a second embodiment of the present invention is disclosed. The filtering agent includes a session identification reading unit that retrieves a session identification from a document multicasted over a multicast channel which identifies the original requester of the document. The filtering agent includes an information classification unit that retrieves published meta data and generates new meta data for each document transmitted with which, in part, it bases its relevance decisions. The filtering agent includes a source unit that retrieves source information from the document. An evaluation unit is coupled to the session identification reading unit, information classification unit, and the source unit. The evaluation unit determines whether the document includes relevant information based on the session identification, data information, the source information, and the channel of distribution. Based on the results of the evaluation unit, the filtering agent either stores the document locally for later use, forwards the document to the original requestor if that user resides within the receiving location""s local area network, or discards the document as not relevant to the local users.