1. Field of the Invention
The present invention pertains to Internet/Intranet systems. More particularly, this invention relates to effectively and efficiently updating content files among multiple duplicate content servers of a data service system.
2. Description of the Related Art
FIG. 1 illustratively shows the Internet 10, which includes an Internet access gateway system 20 connected to a number of user access terminals 11a through 11n via an interconnect network 12. The user terminals 11a-11n can also be referred to as client systems. The gateway system 20 is connected to the global Internet 13, which is connected to a data service system 15. Here, the global Internet 13 is typically formed by a number of data service systems connected together via a high speed interconnect network.
Each of the user access terminals 11a-11n contains a browser software (i.e., 14-14n) that sends access requests to and receive responses from the gateway system 20. The browser (e.g., web browser or other software such as an e-mail software) allows its user (i.e., a person) to access the contents hosted in the data service system 15 through the corresponding user terminal.
FIG. 2 shows a prior art structure of the data service system 15. As can be seen from FIG. 2, the data service system 15 includes a content server. The content server 23 hosts contents for access by the users at the user terminals or client systems 11a-11n (FIG. 1). The content server 23 also includes a World Wide Web (WWW) Internet application, which follows the client-server model and rely on the Transmission Control Protocol (TCP) for reliable delivery of information between the sever and user terminals.
To alleviate overload conditions on the server 23, a number of duplicate web content servers 23a-23n are provided in the system 15. The overload condition occurs when the number of access requests to the content server 23 greatly exceeds the number of access requests the content server 23 can handle. This typically occurs when a large number of access requests are sent to the data service system 15 at the same time. To deal with this situation, the content servers 23a-23n are added in the data service system. The content servers 23a-23n have file systems which mirror each other, as well as that of the content server 23. This means that each content server keeps a local copy of the same content file in its local file system, and replies to user access requests by sending the requested file from its local file system. For example, when a user sends an access request with a URL (e.g., http://www.xyz.com/abc/def.html) for a content file, the file""s path relative to the local file system is encoded in the URL (e.g., /abc/def.html). The content server that is assigned to service the request resolves the URL to the file requested and serves the files from its local file system. Since all of the servers 23-23n have the same file system, any one of the servers 23-23n handles the request will yield the same content file.
A load-balancing router 26 routes requests to one the web servers 23-23n. This balances load of each of the web servers and application servers. This also helps to prevent any individual server from being overloaded with requests.
The content files stored in all of the content servers 23-23n need to be updated whenever their content changes. For example, a Yahoo news site contains many news articles (i.e., content files). The articles are changed and updated regularly. One prior art way to achieve the update is to push, from a central server, the changed file to all of the content servers 23-23n as soon as the change occurs. In FIG. 2, this is done by employing an updating engine 24 to perform the updates. The database 25 serves to store any updates received.
When the update of a content file is received in the database 25, the updating engine 24 computes or generates the updated version of the content file using the updated information. Each of the content files is typically in the HTML (Hyper-Text Markup Language) web page format. The updating engine 24 then sends the updated version of the content file to each of the content servers 23-23n to replace the older version. This completes the update process for the content file.
However, disadvantages are still associated with this prior art approach. One disadvantage is that a given file sent to one of the content servers 23-23n may never be requested by a user of that content server. As a result, the cost of transmitting the updated version of the file to that content server and store it there is wasted.
Another disadvantage is that the prior art scheme of serving all updated files to all of the content servers 23-23n from the central updating engine 24 puts a large load on the updating engine 24. It also generates a large amount of data traffic within the data service system 15. If the content servers 23-23n are physically located at a geographically remote location from the updating engine 24, then the delay in transmitting the updated files to all the content servers 23-23n may be considerable.
In addition, if content consistency is a requirement, then a consistent state exists only when all of the content servers 23-23n have received and stored the updated version of the changed content file. This may take a long time.
One feature of the present invention is to effectively and efficiently updating content files among duplicate content servers of a data service system.
Another feature of the present invention is to maintain content consistency among multiple duplicate content servers while allowing accesses to these content servers at all times.
A further feature of the present invention is to allow early access to updated files.
In accordance with one embodiment of the present invention, a data service system includes a number of duplicate content servers that host a content file with a file name. Each of the content servers stores a version of the same content file that is referred to by a file reference. An updating engine is also provided that, when receiving an update of the content file, generates an updated version of the content file. A file name binding server is coupled to the updating engine and the content servers to generate a new file reference for the updated version of the content file. The file name binding server updates each of the content servers with the updated version by sending the new file reference to a binding table in each of the content servers.
In addition, a content store is also provided in the data service system to store the updated version of the content file before it is fetched by a content server. Each of the content servers includes an HTTP engine, a file manager, a file reference binding table, and a cache. During operation, when a request for the content file is received in one of the content servers, the HTTP engine extracts the file name of the content file from the URL (Universal Resource Locator) of the request. The file name is then used by the file manager to access the binding table for the corresponding file reference of the file name. The file manager then uses the file reference to access the cache to retrieve a version of the content file that corresponds to the file reference. If a cache hit is resulted (i.e., the corresponding version is in the cache), then the file manager retrieves that version from the cache. If a cache miss is resulted (i.e., the corresponding version is not in the cache), then the file manager performs a fetch operation to fetch that version of the content file into the cache from the content store.
Other features and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.