1. Technical Field
The present invention relates generally to an improved data processing system, in particular to a method and apparatus for processing data. Still more particularly, the present invention provides a method, apparatus, and computer implemented instructions for distributing web content and minimizing inconsistencies between data sources.
2. Description of Related Art
The Internet, also referred to as an “internetwork”, is a set of computer networks, possibly dissimilar, joined together by means of gateways that handle data transfer and the conversion of messages from a protocol of the sending network to a protocol used by the receiving network. When capitalized, the term “Internet” refers to the collection of networks and gateways that use the TCP/IP suite of protocols.
The Internet has become a cultural fixture as a source of both information and entertainment. Many businesses are creating Internet sites as an integral part of their marketing efforts, informing consumers of the products or services offered by the business or providing other information seeking to engender brand loyalty. Many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service and secretaries of state. Providing informational guides and/or searchable databases of online public records may reduce operating costs. Further, the Internet is becoming increasingly popular as a medium for commercial transactions.
Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, also called simply “the Web”. Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling the transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). The information in various data files is formatted for presentation to a user by a standard page description language, the Hypertext Markup Language (HTML). In addition to basic presentation formatting, HTML allows developers to specify “links” to other Web resources identified by a Uniform Resource Locator (URL). A URL is a special syntax identifier defining a communications path to specific information. Each logical block of information accessible to a client, called a “page” or a “Web page”, is identified by a URL. The URL provides a universal, consistent method for finding and accessing this information, not necessarily for the user, but mostly for the user's Web “browser”. A browser is a program capable of submitting a request for information identified by an identifier, such as, for example, a URL. A user may enter a domain name through a graphical user interface (GUI) for the browser to access a source of content. The domain name is automatically converted to the Internet Protocol (IP) address by a domain name system (DNS), which is a service that translates the symbolic name entered by the user into an IP address by looking up the domain name in a database.
The Internet also is widely used to transfer applications to users using browsers. With respect to commerce on the Web, individual consumers and business use the Web to purchase various goods and services. In offering goods and services, some companies offer goods and services solely on the Web while others use the Web to extend their reach.
Content distribution systems are employed by businesses and entities delivering content, such as Web pages or files to users on the Internet. Currently, content providers will set up elaborate server systems or other types of data sources to provide content to various users. Web content distribution systems are those systems that are employed to distribute content to these servers and caches. This type of setup includes various nodes that act as sources of data. In this type of content distribution scheme, data from a primary or publishing node is propagated to all of the other nodes in the system. These types of systems require maintenance in addition to being expensive to put in place.
When a node within the system receives a notification that content is being propagated, the node pulls the data from a server or other data source and makes the data available to external clients requesting the data. In an ideal situation, accesses by clients are coordinated with the modification of the data at the various nodes in the system or a client always pulls data from a single node. In this situation, the data read by a single external client is guaranteed to be internally consistent.
Unfortunately, the ideal situation is currently unachievable because central coordination between external clients, nodes such as Web servers and caches, are not practical when scalability and performance are important. Further, different nodes may have dissimilar rates of data retrieval from Web servers and external clients cannot be blocked to ensure the node with the slowest connection to its data server becomes consistent with other nodes without a degradation of performance. Additionally, with the use of one or more load balancers between a client and a data source, a client may receive the same data from two different servers depending on network conditions.
Therefore, it would be advantageous to have an improved method, apparatus, and computer implemented instructions for distributing content and minimizing inconsistency between data sources.