1. Field of the Invention
The present invention is related to updating of databases and files, such as those used for anti-malware software, and more particularly, to a method and system for difference-based software updating of such files and databases.
2. Description of the Related Art
Some computer software publishers update their software “applications” (computer programs and data files associated with the programs) frequently. For some types of software applications, such as virus protection software, these updates are particularly frequent. Malware protection software applications, such as anti-virus software, are designed to detect computer viruses on a computer system, and may also remove found viruses. Because these anti-malware applications rely on data about specific viruses, worms, adware, spam, firewall vulnerabilities, and because new types of malware are constantly being written to avoid current malware detection capabilities, it is necessary to update malware protection software applications on a regular basis to account for the newest malware. Frequent updating of data files is also necessary for some database publishers, who must put up-to-date information in their databases, and remove obsolete information. Periodic updating of general software applications to expand capabilities and eliminate “bugs” is also common.
Currently, several methods are used to update software applications. The simplest of these is to distribute one entire software application to replace an older one. This “full update” method is expensive and inconvenient. When full updates are distributed over the Internet, they often cause such high loads on servers that other users suffer slow-downs on the network, and the servers have trouble meeting the demands.
Some software publishers distribute “incremental updates.” These updates do not contain entire software applications, but rather only that information necessary to transform a given version of a software application to a newer version. Because most software updates involve changes to only a small portion of a software application, only a small data file including the differences between the two versions needs to be distributed.
The use of incremental update methods allows for smaller updates which can be distributed over the Internet. One of the issues related to conventional incremental updates that needs to be addressed is the question of resource utilization on both the client-side and the server side. Currently, anti-malware databases and files are updated relatively frequently, once every few hours, or even once every hour. Using anti-virus software as an example, on the client-side, there is a file, or a set of files that contains the masks of the known viruses and other information. When updated, using conventional incremental update schemes, an update, usually in the form of a small “difference,” or “diff,” is typically downloaded from the server that includes instructions for modifying the file. In other words, rather than downloading the entire file, only a relatively small amount of data, representing information about the new viruses, is actually downloaded. These “diffs,” and are typically in the format of:
Replace line 102 with [ ]
Add line 103 as follows: [ ]
Delete line 121
One difficulty with the conventional approach is the fact that many computers are not continuously connected to the Internet, may be turned off, access to the server may be unavailable, etc. Thus, the version of the file on the client's side may not be the latest version, or the “latest minus one version”, but is actually several diffs (or potentially a large number of diffs), old. Furthermore, the general trend in the industry, particularly in the area of combating viruses, trojans, spam, worms, adware, spyware, and other types of malware is that the files and databases need to be updated with increasing frequency, for example, every 15 minutes, or every 5 minutes, or, essentially, continuously. This has significant consequences for structuring the interaction between the client and the server.
On the server side, even though the volume of data that needs to be sent to the individual client might be relatively modest, the problem is compounded when millions, or tens of millions of users continuously request updates from the same server. Self-evidently, the greater the frequency of the updates, the greater the load on the server, even if the response to the request ultimately does not involve any updates—there is overhead involved in responding to any request from the user.
On the user side, it is generally desirable to require as little “intelligence” as possible from the client, as far as figuring out which version of the file the client currently has, which version it needs, and whether it needs to update or not—in other words, it is desirable to reduce, as much as possible any processing on the client-side associated with a client's request to the server for such updates.
Thus, one of the problems with a conventional approach described in U.S. Pat. No. 6,651,249 is the need for the client to download one or more “delta catalogs” and then process them on the client side. This increases the load on the server, and increases the amount of processing needed on the client-side.
Accordingly, there is a need in the art for a method and system that efficiently updates anti-malware and other frequently changing databases and files.