American business has an enormous investment in the information systems stored on its computers. Corporations have created and attempted to maintain disaster recovery systems to deal with disasters or hardware failures on their mainframe computers. However, no comparable protection exists for small businesses or the enormous assets housed on personal computers.
When data stored on a computer is destroyed or lost, the consequences can be disastrous. The data must be reconstructed, which requires great time and expense. In addition, some data may not be able to be adequately or properly reconstructed. Thus, the backup of data stored on computers is a prudent preventive measure to guard against the loss of data.
Data stored on a computer can be lost in many ways. The physical hardware on which the data is stored is susceptible to many types of danger, such as by fire, flood, theft, or any other event that adversely affects the storage medium. Data can also be lost due to failures in the hardware or software associated with the data storage system, such as a hard disk crash. Additionally, the interaction between human users and computers increases the risk of data destruction. Data may be lost in many ways, including mistaken deletion by a user, virus contamination, and tampering.
Therefore, generating a backup copy of the data stored on a computer is a prudent undertaking. However, current methods of generating and maintaining a backup copy of data are imperfect. When backup copies are generated by the user at the client site, the backup operation must be initiated by the user. Such a procedure requires training of the user on the proper way to create a backup. Also, the backup operation is non-automated and thus requires an active procedure on the part of the user that may be forgotten or neglected. Furthermore, physical damage to the computer site, such as that caused by fire, can also destroy the backup media.
Much of the information stored on computer systems consists of relatively static information purchased from outside sources. Examples include operating systems and application programs, as well as other programs such as graphical files, help files, tutorials, etc. Many current backup operations create a copy of the entire contents of the data stored on the computer. Although this method is thorough, it is also inefficient. An enormous amount of space could be saved at the backup site by storing only one master copy of each of these relatively static files rather than storing a separate copy of each identical static file.
Computer file data is stored predominantly on magnetic hard disk drives. Magnetic disk drives include a stack of several rigid aluminum disks, or platters, coated with magnetic material. Each platter has two sides, with each side having an associated read/write device called a head. The head is moveable in a generally radial manner between the outer edge and the inner edge of the disk such that all locations on the disk are accessible.
Each platter is divided into concentric rings called cylinders. Each cylinder is divided into multiple sections called sectors. Each sector on each platter of the hard disk drive has a unique identifier called the Head, Cylinder, Sector (HCS) number. Thus, each sector is identified by a single HCS number and can be independently accessed by moving the head radially to the sector location.
Copying the entire contents of the disk drives of a computer system can be extremely time consuming. Copying all data stored on a computer requires that every sector containing data be accessed and its contents read and then written to a backup copy. This causes unnecessary wear on the mechanical components of the disk drive and increases the time required for the backup operation to complete. Also, many files are maintained by making only relatively small changes to larger existing files. Thus, a backup operation that transmits an entire file stored on a computer actually transmits mostly identical data that previously existed and was transmitted to the backup site during the previous backup operation. Transmitting mostly identical information during a backup operation thus wastes time. A large time savings could be realized by transmitting and storing only the changes made to a previously stored backup file copy.
Furthermore, the vast amount of data to be backed up requires the backup media to include vast storage capacity. Redundant backup systems are often expensive as they require a storage medium of approximately the same size as the storage medium for which backup is desired. Not only does the requirement of adequate storage capacity increase the expense of backup operations, but such an operation requires monitoring by a user to provide additional storage capacity when necessary.
For many prior methods of data backup, the backup copy is created at the client site but stored at an off-site location in order to prevent destruction of both the original data and the backup copy by the same catastrophic event. Thus, the physical transportation of the storage medium on which the backup copy is stored is required from the client site to the backup site, which requires further expense and opportunities for media damage.
Furthermore, certain data contained within a client flies may be sensitive and thus must be protected from unauthorized access. Not only must the physical media on which the backup copy is stored be secured against unauthorized access or theft, but the transmission of data from the client site to the backup site should be manipulated such that an unauthorized retrieval of the data will not reveal any confidential data. Therefore, creating backup copies of confidential data actually increases the likelihood that such confidential information may be retrieved by an unauthorized party. Prior methods of data backup do not manipulate the data to be backed up such that any client-sensitive information is specially encoded to prevent unauthorized access.
Thus, the need exists for a method and system for automatically transmitting backup copies of data to a remote backup site for storage. A need also exists for a method and system in which only the changes made to file data since the most recent backup operation are transmitted to the backup site. Furthermore, a need exists for a backup system in which only one copy of large, relatively static files is stored at the backup facility. Finally, a need exists for encoding and encrypting file data transmitted from the client site to the backup site to provide protection against unauthorized access to a client's confidential data.