The present invention relates generally to a method for backing up and restoring data files and programs on a computer system, and more particularly, the invention relates to an efficient method for determining whether a file or program has been previously backed up, or if a backed up copy of that file exists, and then only backing up those programs that have not been previously backed up and which have no backed up copies. Thus, the system and method allows for efficient use of bandwidth to locally, or remotely, back up files of a computer and/or computer system.
Conventional approaches for backing up computer programs and data files often utilize large amounts of expensive network bandwidth and excessive amounts of processor (CPU) processing time. Currently, many backup processes back up the entire program and data repository of a computer or computer system, leading to duplication of backed up files and programs, and requiring large amounts of network bandwidth and excessive amounts of storage media (such as tapes or compact discs (CDs)).
The networks of many organizations often comprise data centers (“server farms”) to store and manage great amount of Internet accessible data. Data centers often include several computer systems, such as Internet servers, employee workstations, file servers, and the like. Often, such data centers have scalability problems using traditional backup systems. The required bandwidth and storage is not sufficient to do massive backup of data center environments. A system that is scalable and can grow with an organization would be beneficial.
Some savings of bandwidth and storage media can be achieved by incremental backup methods, which only back up those files that have been changed or updated. However, these methods do not solve the problem that duplicate files residing on different computers on a network, or even different networks, still often get backed up in a duplicate fashion, eating up extensive amounts of storage media.
For example, data files are often shared among many persons, and duplicate copies reside on many different computers, leading to many multiples of copies of files across one, or many, computer networks. Further, computers often use duplicate program and data files for running operating systems and applications. In a network running Microsoft Windows®, for example, each computer may have duplicate operating system files and programs. Backing up the entire network using conventional means may result in many multiples of copies of those files and programs, leading to an extensive waste in storage media. A means of eliminating duplication of backed up files and programs would be desirable, with the possible benefits resulting in more efficient use of storage media, processing time, and network bandwidth.
Further, conventional backup methods implemented by organizations often use a multiplicity of computer servers to perform the backups, often back up to tape media, leading to distributed storage of data backups, again leading to duplication and waste in both media and processor time.
Further still, a distributed backup process typically leads to the need to store many backup tapes, or other similar backup media, and requires a method of tracking the multiple media. Such a system is often very difficult to restore, especially if incremental backup processes are used. The proper storage media must be located and loaded in the proper sequence. Tape restoration is a lengthy, time consuming process. Often, it is so inefficient and error prone that the restore process is ineffective, resulting in a loss of data and even a loss of productivity, as programs must be re-installed and data rebuilt. A more efficient, easier to use backup system leading to more effective and more easily implemented restore procedure would be beneficial to organizations using computer systems.