1. Field of the Invention
This invention relates to a method and system for managing files in a server/Virtual Private Server environment, virtual environment, and, more particularly, to handling duplication of files among the Virtual Private Servers.
2. Description of the Related Art
Commercial hosting services are often provided by an Internet Service Provider (ISP), which generally provides a separate physical host computer for each customer on which to execute a server application. However, a customer purchasing hosting services will often neither require nor be amenable to paying for use of an entire host computer. In general, an individual customer will only require a fraction of the processing power, storage, and other resources of a host computer.
Accordingly, hosting multiple server applications on a single physical computer is desirable. In order to be commercially viable, however, every server application needs to be isolated from every other server application running on the same physical host. In that context, a single computer system (usually called a “host”) supports multiple virtual servers (e.g., Virtual Private Servers or VPSs), sometimes as many as hundreds or thousands of VPSs.
Many applications involving such virtualization concepts as Virtual Private Servers (VPS) use VPSs in the context of webservers. Such a VPS therefore has an assigned IP address, and various other parameters associated with a webserver, for example, a URL, a DNS name, etc.
A typical situation where VPSs can be used is a single “box” (in other words, a single computer) located in a data center has a host operating system a plurality of VPSs (VPS1, VPS 2, VPS 3, etc.), each of which appears to the user as a single dedicated webserver. The user is therefore unaware that the VPS is something other than an entire “box” dedicated to that particular user.
One of the problems in modern Virtual Private Server (VPS) development is the sheer volume of the files and data required to support a typical VPS. Currently, the total volume of a disk drive partition typically allocated to each VPS is on the order of several hundred megabytes, and sometimes several gigabytes, of disk space. This does not include any user application files—it only represents the files that are required by the VPS itself to function. Disk partitions on the order of several gigabytes are rarely a problem for modern data storage, since even readily available computers (for example, in 2005) come with disk drives that are tens (or sometimes hundreds) of gigabytes. Thus, a single VPS taking up several hundred megabytes or several gigabytes out of that total amount does not present any difficulties.
However, in many cases, a single physical computer or hardware server runs not one (or even several) Virtual Private Servers, but hundreds or even thousands of such VPSs at the same time. Also, a fact of life in many such multi-VPS systems is that at any given time, only a small fraction of the Virtual Private Servers are actually doing anything, with the rest being essentially quiescent. Nonetheless, each of those even quiescent VPS still requires a full set of files, taking up a considerable amount of space on the hard drive.
Various disk space management techniques are known, such as distributing the VPS files off the local hard drive and locating them on network hard drives. However, this is not a panacea, since the VPSs' own overhead uses up network bandwidth in this case. Moreover, this does not address the problem of loading the duplicate files into RAM.
At the same time, much of the current software development methodology frequently assumes that the amount of disk space (and to a lesser extent, RAM) is essentially unlimited. Thus, many of the widely-used user applications familiar to many people, such as word processors, spreadsheets, games, etc., in the last two decades have gone from requiring a few hundred thousand kilobytes of disk space for installation to dozens of megabytes, and several megabytes of memory when running. The VPS technology is not immune to this trend, since it is generally easier to buy more storage, if necessary, than to place additional constraints on VPS product developers. In essence, having software developers spend their time and efforts on reducing the total size of the files of the VPS is not viewed as an effective way to utilize developer resources, since the issue arises primarily in the context of scalability, particularly scalability to hundreds (or thousands) of VPSs running on the same physical machine.
In sum, one of the limitations on VPS scalability is the size of the disk partition required by each VPS and the number of files that need to be loaded into physical memory. Accordingly, there is a need in the art to reduce the amount of disk space taken up by the VPS and to reduce the amount memory taken up by an individual VPS's files, so as to enable more VPSs to run on a single physical computer.